**Springer Topics in Signal Processing** 

# Franz Zotter Matthias Frank

# Ambisonics

A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality

# Springer Topics in Signal Processing

Volume 19

#### Series Editors

Jacob Benesty, INRS-EMT, University of Quebec, Montreal, QC, Canada Walter Kellermann, Erlangen-Nürnberg, Friedrich-Alexander-Universität, Erlangen, Germany

The aim of the Springer Topics in Signal Processing series is to publish very high quality theoretical works, new developments, and advances in the field of signal processing research. Important applications of signal processing will be covered as well. Within the scope of the series are textbooks, monographs, and edited books.

More information about this series at http://www.springer.com/series/8109

Franz Zotter • Matthias Frank

# Ambisonics

A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality

Franz Zotter Institute of Electronic Music and Acoustics University of Music and Performing Arts Graz, Austria

Matthias Frank Institute of Electronic Music and Acoustics University of Music and Performing Arts Graz, Austria

ISSN 1866-2609 ISSN 1866-2617 (electronic) Springer Topics in Signal Processing ISBN 978-3-030-17206-0 ISBN 978-3-030-17207-7 (eBook) https://doi.org/10.1007/978-3-030-17207-7

© The Editor(s) (if applicable) and The Author(s) 2019. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

## Preface

The intention of this textbook is to provide a concise explanation of fundamentals and background of the surround sound recording and playback technology Ambisonics.

Despite the Ambisonic technology has been practiced in the academic world for quite some time, it is happening now that the recent ITU,<sup>1</sup> MPEG-H,<sup>2</sup> and ETSI<sup>3</sup> standards firmly fix it into the production and media broadcasting world.

What is more, Internet giants Google/YouTube recently recommended to use tools that have been well adopted from what the academic world is currently using.4,5

Last but most importantly, the boost given to the Ambisonic technology by recent advancements has been in usability: Ways to obtain safe Ambisonic decoders,6,7 the availability of higher-order Ambisonic main microphone arrays (Eigenmike,<sup>8</sup> Zylia<sup>9</sup> ) and their filter-design theory, and above all: the usability increased by plugins integrating higher-order Ambisonic production in digital audio workstations or mixers.7,10,11,12,13,14,15 And this progress was a great motivation to write a book about the basics.

<sup>1</sup> https://www.itu.int/rec/R-REC-BS.2076/en.

<sup>2</sup> https://www.iso.org/standard/69561.html.

<sup>3</sup> https://www.techstreet.com/standards/etsi-ts-103-491?product\_id=1987449.

<sup>4</sup> https://support.google.com/jump/answer/6399746?hl=en.

<sup>5</sup> https://developers.google.com/vr/concepts/spatial-audio.

<sup>6</sup> https://bitbucket.org/ambidecodertoolbox/adt.git.

<sup>7</sup> https://plugins.iem.at/.

<sup>8</sup> https://mhacoustics.com/products.

<sup>9</sup> https://www.zylia.co.

<sup>10</sup> http://www.matthiaskronlachner.com/?p=2015.

<sup>11</sup> http://www.blueripplesound.com/product-listings/pro-audio.

<sup>12</sup> https://b-com.com/en/bcom-spatial-audio-toolbox-render-plugins.

<sup>13</sup> https://harpex.net/.

<sup>14</sup> http://forumnet.ircam.fr/product/panoramix-en/.

<sup>15</sup> http://research.spa.aalto.fi/projects/sparta\_vsts/.

The book is dedicated to provide a deeper understanding of Ambisonic technologies, especially for but not limited to readers who are scientists, audio-system engineers, and audio recording engineers. As, from time to time, the underlying maths would get too long for practical readability, the book comes with a comprehensive appendix with the beautiful mathematical details.

For a common understanding, the introductory section spans a perspective on Ambisonics from its origins in coincident recordings from the 1930s, to the Ambisonic concepts from the 1970s, and to classical ways of applying Ambisonics in first-order coincident sound scene recording and reproduction that have been practiced from the 1980s on.

In its main contents, this book intends to provide all psychoacoustical, signal processing, acoustical, and mathematical knowledge needed to understand the inner workings of modern processing utilities, special equipment for recording, manipulation, and reproduction in the higher-order Ambisonic format. As advanced outcomes, the aim of the book is to explain higher-order Ambisonic decoding, 3D audio effects, and higher-order Ambisonic recording with microphones or main microphone arrays. Those techniques are shown to be suitable to supply audience areas ranging from studio-sized to hundreds of listeners, or headphone-based playback, regardless whether it is live, interactive, or studio-produced 3D audio material.

The book comes with various practical examples based on free software tools and open scientific data for reproducible research.

Our Ambisonic events experience: In the past years, we have contributed to organizing Symposia on Ambisonics (Ambisonics Symposium 2009 in Graz, 2010 in Paris, 2011 in Lexington, 2012 in York, 2014 in Berlin), demonstrated and brought the technology to various winter/summer schools and conferences (EAA Winter School Merano 2013, EAA Symposium Berlin 2014, workshops and Ambisonic music repertory demonstration at Darmstädter Ferienkurse für Neue Musik in 2014, ICAD workshop in Graz 2015, ICSA workshop 2015 in Graz with PURE Ambisonics night, summer school at ICSA 2017 in Graz, a course at Kraków film music festival 2015, mAmbA demo facility DAGA in Aachen 2016, Al Di Meola's live 3D audio concert hosted in Graz in June 2016, and AES Convention Milano 2018.

In 2017 (ICSA Graz) and 2018 (TMT Cologne), we initiated and organized Europe's First and Second Student 3D Audio Production Competition together with Markus Zaunschirm and Daniel Rudrich.

Graz, Austria Franz Zotter February 2019 Matthias Frank

# Acknowledgements

To our lab and colleagues: Traditionally at the Institute of Electronic Music and Acoustics (IEM), there had been a lot of activity in developing and applying Ambisonics by Robert Höldrich, Alois Sontacchi, Markus Noisternig, Thomas Musil, Johannes Zmölnig, Winfried Ritsch, even before our active time of research. Most of the developments were done in pure-data, e.g., with [iem\_ambi], [iem\_bin\_ambi], [iem\_matrix], CUBEmixer. Dear colleagues deserve to be mentioned who contributed a lot of skill to improve the usability of Ambisonics: Hannes Pomberger and his mathematical talent, Matthias Kronlachner who developed the ambix and mcfx VST plugin suites in 2014, and Daniel Rudrich, who developed the IEM Plugin Suite, that also involves technology that was elaborated together with our colleagues Markus Zaunschirm, Christian Schörkhuber, Sebastian Grill.

We thank you all for your support; it's the best environment to work in!

First readers: We thank Archontis Politis (Aalto Univ., Espoo and Tampere Univ., Finland), Nicolas Epain (b<>com, France), and Matthias Kronlachner (Harman, Germany/US), to be our first critical readers, supplying us with valuable comments.

Open Access Funding: We are grateful about funding from our local government of Styria (Steiermark) Section 8, Office for Science and Research (Wissenschaft und Forschung) that covers roughly half the open access publishing costs. We gratefully thank our University (University of Music and Performing Arts, "Kunstuni", Graz) for the other half, transition to open access library, and vice rectorate for research.

## Outline

First-order Ambisonics is nowadays strongly revived by internet technology supported by Google/YouTube, Facebook 360°, 360° audio and video recording and rendering, as well as VR in games. This renaissance lies in its benefits of (i) its compact main microphone arrays capturing the entire surrounding sound scene in only four audio channels (e.g., Zoom H3-VR, Oktava A-Format Microphone, Røde NT-SF1, Sennheiser AMBEO VR Mic.), and (ii) it easily permits rotation of the sound scene, allowing to render surround audio scenes, e.g., on head-tracked headphones, head-mounted AR/VR sets, or mobile devices, as described in Chap. 1.

Auditory events and vector-base panning: Chapter 2 of this book is dedicated to conveying a comprehensive understanding of the localization impressions in multi-loudspeaker playback and its models, followed by Chap. 3 that outlines the essentials of practical vector panning models and their extensions by downmix from imaginary loudspeakers, which are both fundamental to contemporary Ambisonics.

Harmonic functions, Ambisonic encoding and decoding: Based on the ideals of accurate localization with panning-invariant loudness and perceived width, Chap. 4 provides a profound mathematical derivation of higher-order Ambisonic panning functions in 2D and 3D in terms of angular harmonics. These idealized functions can be maximized in their directional focus (max-rE) and they are strictly limited in their directional resolution. This resolution limit entails perfectly well-defined constraints on loudspeaker layouts that make us reach ideal measures for accurate localization as well as panning-invariant loudness and width. And what is highly relevant for practical decoding: All-Round Ambisonic decoding to loudspeakers and TAC/MagLS decoders for headphones are explained in Chap. 4.

The Ambisonic signal processing chain and effects are described in Chap. 5. It illustrates the signal flow from source encoding through Ambisonic bus to decoding and where input-specific or general insert and auxiliary Ambisonic effects are located. In particular, the chapter describes the working principles behind frequency-independent manipulation effects that are either mirroring/rotating/ re-mapping, warping, or directionally weighting, or such effects that are frequency-dependent. Frequency-dependent effects can introduce widening, depth or diffuseness, convolution reverb, or feedback-delay-network (FDN)-based diffuse reverberation. Directional resolution enhancements are outlined in terms of SDM/SIRR pre-processing of recorded reverberation and in terms of available tools such as HARPEX, DirAC, and COMPASS for recorded signals.

Compact higher-order Ambisonic microphones rely on the solutions of the Helmholtz equation, and their processing uses a frequency-independent decomposition of the spherical array signals into spherical harmonics and the frequency-dependent radial-focusing filtering associated with each spherical harmonic order, which yield the Ambisonic signals. The critical part is to handle the properties of radial-focusing filters in the processing of higher-order Ambisonic microphone arrays (e.g., the Eigenmike). To keep the noise level and the sidelobes in the recordings low and a balanced frequency response, a careful way for radial filter design is outlined in Chap. 6.

Compact higher-order loudspeaker arrays oppose the otherwise inwardsoriented Ambisonic surround playback, as described in Chap. 7. This outlooking last chapter discusses IKO and loudspeaker cubes as compact spherical loudspeaker arrays with Ambisonically controlled radiation patterns. In natural environments with acoustic reflections, such directivity-controlled arrays have their own sound-projecting and distance-changing effects, and they can be used to simulate sources of specific directivity patterns.

# Contents



#### Contents xiii



# **Chapter 1 XY, MS, and First-Order Ambisonics**

*Directionally sensitive microphones may be of the light moving strip type. […] the strips may face directions at* 45◦ *on each side of the centre line to the sound source.*

Alan Dower Blumlein [1], Patent, 1931

**Abstract** This chapter describes first-order Ambisonic technologies starting from classical coincident audio recording and playback principles from the 1930s until the invention of first-order Ambisonics in the 1970s. Coincident recording is based on arrangements of directional microphones at the smallest-possible spacings in between. Hereby incident sound approximately arrives with equal delay at all microphones. Intensity-based coincident stereophonic recording such as XY and MS typically yields stable directional playback on a stereophonic loudspeaker pair. While the stereo width is adjustable by MS processing, the directional mapping of firstorder Ambisonics is a bit more rigid: the omnidirectional and figure-of-eight recording pickup patterns are reproduced unaltered by equivalent patterns in playback. In perfect appreciation of the benefits of coincident first-order Ambisonic recording technologies in VR and field recording, the chapter gives practical examples for encoding, headphone- and loudspeaker-based decoding. It concludes with a desire for a higher-order Ambisonics format to get a larger sweet area and accommodate first-order resolution-enhancement algorithms, the embedding of alternative, channel-based recordings, etc.

Intensity-based coincident stereophonic recording such as XY uses two figure-ofeight microphones, after Blumlein's original work [1] from the 1930s, with an angular spacing of 90◦, see [2–4]). Another representative, MS, uses an omnidirectional and a lateral figure-of-eight microphone [2]. Both typically yield a stable directional playback in stereo, but signals often get too correlated, yielding a lack in depth and diffuseness of the recording space when played back [5, 6] and compared to delay-based AB stereophony or equivalence-based alternatives.

Gerzon's work in the 1970s [7] gave us what we call first-order Ambisonic recording and playback technology today. Ambisonics preserves the directional mapping by recording and reproducing with spatially undistorted omnidirectional and figureof-eight patterns on circularly (2D) or spherically (3D) surrounding loudspeaker layouts.

#### **1.1 Blumlein Pair: XY Recording and Playback**

The XY technique dates back to Blumlein's patent from the 1930s [1] and his patents thereafter [4]. Nowadays outdated, manufacturers started producing ribbon microphones that offered means to record with figure-of-eight pickup patterns.

*Blumlein Pair using 90*◦*-angled figure-of-eight microphones (XY)*. Blumlein's classic coincident microphone pair [3, Fig. 3] uses two figure-of-eight microphones pointing to ±45◦, see Fig. 1.1. Its directional pickup pattern is described by cos φ when φ is the angle enclosed by microphone aiming and sound source. Using a mathematically positive coordinate definition for X (front-right) and Y (front-left), the polar angle ϕ = 0 aiming at the front, the figure-of-eight X uses the angle φ = ϕ + 45◦ and Y the angle φ = ϕ − 45◦, so that the pickup pattern of the microphone pair is:

$$\mathbf{g}\_{\rm XY}(\varphi) = \begin{bmatrix} \cos(\varphi + 4\mathfrak{F}^{\circ}) \\ \cos(\varphi - 4\mathfrak{F}^{\circ}) \end{bmatrix}. \tag{1.1}$$

Assuming a signal*s* coming from the angle ϕ, the signals recorded are [*X*, *Y* ] <sup>T</sup> *g*(ϕ)*s*. Sound sources from the left 45◦, the front 0◦ and the right −45◦ will be received by the pair of gains:

**Fig. 1.1** Blumlein pair consisting of 90◦-angled figure-of-eight microphones

$$\text{right:}\quad \mathbf{g}\_{\text{XY}}(-4\mathbf{\hat{S}}^{\circ}) = \begin{bmatrix} 1\\0 \end{bmatrix}, \quad \text{center:}\quad \mathbf{g}\_{\text{XY}}(0^{\circ}) = \begin{bmatrix} \frac{1}{\sqrt{2}}\\\frac{1}{\sqrt{2}} \end{bmatrix}, \quad \text{left:}\quad \mathbf{g}\_{\text{XY}}(4\mathbf{\hat{S}}^{\circ}) = \begin{bmatrix} 0\\1 \end{bmatrix}.$$

Obviously, a source moving from the right −45◦ to the left 45◦ pans the signal from the channel X to the channel Y. This property provides a strongly perceivable lateralization of lateral sources when feeding the left and right channel of a stereophonic loudspeaker pair by Y and X, respectively.

However, ideally there should not be any dominant sounds arriving from the sides, as for the source angles between −135◦ ≤ ϕ ≤ −45◦ and 45◦ ≤ ϕ ≤ 135◦ the Blumlein pair produces out-of-phase signals between X and Y. The back directions are mapped with consistent sign again, however, left-right reversed. It is only possible to avoid this by decreasing the angle between the microphone pair, which, however, would make the stereo image narrower.

Therefore, coincident XY recording pairs nowadays most often use cardioid directivities <sup>1</sup> <sup>2</sup> <sup>+</sup> <sup>1</sup> <sup>2</sup> cos ϕ, instead. They receive all directions without sign change and easily permit stereo width adjustments by varying the angle between the microphones.

#### **1.2 MS Recording and Playback**

Blumlein's patent [1] considers sum and difference signals between a pair of channels/microphones, yielding M-S stereophony. In M-S [8], the sum signal represents the mid (omnidirectional, sometimes cardioid-directional to front) and the difference the side signal (figure-of-eight). MS recordings can also be taken with cardioid microphones and permit manipulation of the stereo width of the recording.

*MS recording by omnidirectional and figure-of-eight microphone (native MS)*. Mid-side recording can be done by using a pair of coincident microphones with an omnidirectional (mid, W) and a side-ways oriented figure-of-eight (side, Y) directivity, Fig. 1.2. The pair of pickup patterns is described by the vector:

**Fig. 1.2** Native mid-side recording with the coincident arrangement of an omnidirectional microphone heading front and a figure-of-eight microphone heading left

$$\mathbf{g}\_{\rm{WY}}(\boldsymbol{\varphi}) = \begin{bmatrix} 1 \\ \sin(\boldsymbol{\varphi}) \end{bmatrix} \tag{1.2}$$

that depends on the angle ϕ of the sound source. Equation (1.2) maps a single sound *s* from ϕ to the mid *W* and side *Y* signals by the gains [*W*, *Y* ] <sup>T</sup> = *g*(ϕ)*s*

$$\text{Left: } \mathbf{g\_{WY}}(90^\circ) = \begin{bmatrix} 1 \\ 1 \end{bmatrix}, \quad \text{right: } \mathbf{g\_{WY}}(-90^\circ) = \begin{bmatrix} 1 \\ -1 \end{bmatrix} \quad \text{center: } \mathbf{g\_{WY}}(0^\circ) = \begin{bmatrix} 1 \\ 0 \end{bmatrix}.$$

*MS recording with a pair of 180*◦*-angled cardioids*. Two coincident cardioid microphones (cardioid directivity <sup>1</sup> <sup>2</sup> <sup>+</sup> <sup>1</sup> <sup>2</sup> cos ϕ) pointing to the polar angles 90◦ (left) and −90◦ (right) are also applicable to mid-side recording, Fig. 1.3. Their pickup patterns

$$\mathbf{g}\_{\mathbb{C}\pm90^\circ}(\varphi) = \frac{1}{2} \begin{bmatrix} 1 + \cos(\varphi - 90^\circ) \\ 1 + \cos(\varphi + 90^\circ) \end{bmatrix} = \frac{1}{2} \begin{bmatrix} 1 + \sin(\varphi) \\ 1 - \sin(\varphi) \end{bmatrix} \tag{1.3}$$

are encoded into the MS pickup patterns (W,Y) by a matrix

$$\mathbf{g}\_{\rm WP}(\varphi) = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \mathbf{g}\_{\rm C\pm 90^{\circ}}(\varphi). \tag{1.4}$$

The matrix eliminates the cardioids' figure-of-eight characteristics by their sum signal, and their omnidirectional characteristics by the difference. We obtain the MS signal pair (W,Y) from the cardioid microphone signals as

$$
\begin{bmatrix} W \\ Y \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} C\_{90^\circ} \\ C\_{-90^\circ} \end{bmatrix} . \tag{1.5}
$$

(a) 180 angled cardioid microphones (b) Picture of the recording setup

**Fig. 1.3** Mid-side recording by 180◦-angled cardioids

**Fig. 1.4** Change of the stereo width by modifying the balance between W and Y signals of MS (left). Decoding of the M/S signal pair (W, Y) to a stereo loudspeaker pair (right)

*Decoding of MS signals to a stereo loudspeaker pair*. Decoding of the mid-side signal pair to left and right loudspeaker is done by feeding both signals to both loudspeakers, however out-of-phase for the side signal, Fig. 1.4b:

$$
\begin{bmatrix} L \\ R \end{bmatrix} = \frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} W \\ Y \end{bmatrix}. \tag{1.6}
$$

An interesting aspect about the 180◦-angled cardioid microphone MS is that after inserting the XY-to-MS encoder Eq. (1.5) into the decoder Eq. (1.6), a brief calculation shows that matrices invert each other. In this case, the cardioid signals are directly fed to the loudspeakers [*L*, *R*]=[*C*90◦ , *C*−90◦ ].

*Stereo width*. Modifying the mid versus side signal balance before stereo playback, using a blending parameter α, allows to change the width of the stereo image from α = 0 (narrow) to α = 1 (full), Fig. 1.4a, see also [9]:

$$
\begin{bmatrix} L \\ R \end{bmatrix} = \frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} 2 - \alpha & 0 \\ 0 & \alpha \end{bmatrix} \begin{bmatrix} W \\ Y \end{bmatrix}. \tag{1.7}
$$

In stereophonic MS playback, the playback loudspeaker directions at ±30◦ are not identical to the peaks of the recording pickup pattern of the side channel (Y) at ±90◦. Ambisonics assumes a more strict correspondence between directional patterns of recording and patterns mapped on the playback system.

#### **1.3 First-Order Ambisonics (FOA)**

After Cooper and Shiga [10] worked on expressing panning strategies for arbitrary surround loudspeaker setups in terms of a directional Fourier series, the notion and technology of Ambisonics was developed by Felgett [11], Gerzon [7], and Craven [12]. In particular, they were also considering a suitable recording technology.

Essentially based on similar considerations as MS, one can define first-order Ambisonic recording. For 2D recordings, a Double-MS microphone arrangement is suitable and only requires one more microphone than MS recording: a front-back oriented figure-of-eight microphone. The scheme is extended to 3D first-order Ambisonics by a third figure-of-eight microphone of up-down aiming. Oftentimes, first-order Ambisonics still is the basis of nowadays' virtual reality applications and 360◦ audio streams on the internet. In addition to potential loudspeaker playback, it permits interactive playback on head-tracked headphones to render the acoustic sound scene static to the listener.

First-order Ambisonic recording has the advantage that it can be done with only a few high-quality microphones. However, the sole distribution of first-order Ambisonic recordings to playback loudspeakers is typically not convincing without going to higher orders and directional enhancements (Sect. 5.8).

#### *1.3.1 2D First-Order Ambisonic Recording and Playback*

The first-order Ambisonic format in 2D consists of one signal corresponding to an omnidirectional pickup pattern (called W), and two signals corresponding to the figure-of-eight pickup patterns aligned with the Cartesian axes (X and Y).

*Native 2D Ambisonic recording (Double-MS)*. To record the Ambisonic channels W, X, Y, one can use a Double-MS arrangement as shown in Fig. 1.5.

*2D Ambisonic recording with four 90*◦*-angled cardioids*. Extending the MS scheme for recording with cardioid microphones, Fig. 1.3, cardioid microphones could be used to obtain the front-back and left-right figure-of-eight pickup patterns by corresponding pair-wise differences, and one omnidirectional pattern as their sum, Fig. 1.6. However, the use of 4 microphones for only 3 output signals is inefficient.

**Fig. 1.5** Native 2D first-order Ambisonic recording with an omnidirectional and a figure-of-eight microphone heading front, and a figure-of-eight microphone heading left; photo shown on the right

(a) 2D FOA with 4 cardioid microphones (b) Picture of recording setup

**Fig. 1.6** 2D first-order Ambisonic recording with four 90◦-angled cardioid microphones, sums and differences between them (front±back, left±right)

**Fig. 1.7** 2D first-order Ambisonics with three 120◦-angled cardioid microphones

*2D Ambisonic recording with three 120*◦*-angled cardioids*. Assuming 3 coincident cardioid microphones aiming at the angles 0◦, ±120◦ in the horizontal plane, cf. Fig. 1.7, we obtain as the pickup pattern for the incoming sound

$$\mathbf{g}(\theta) = \frac{1}{2} + \frac{1}{2} \begin{bmatrix} \cos(\varphi) \\ \cos(\varphi + 120^\circ) \\ \cos(\varphi - 120^\circ) \end{bmatrix}.$$

Combining all the three microphone signals yields an omnidirectional pickup pattern as *<sup>N</sup>*−<sup>1</sup> *<sup>k</sup>*=<sup>0</sup> cos(ϕ <sup>+</sup> <sup>2</sup><sup>π</sup> *<sup>N</sup> k*) = 0. Moreover introducing the differences between the front and two back microphone signals and between the left and right microphone signals yields an encoding matrix to obtain the omnidirectional W and the two X and Y figure-of-eight characteristcs

$$
\frac{2}{3} \begin{bmatrix} 1 & 1 & 1 \\ 2 & -1 & -1 \\ 0 & \sqrt{3} & -\sqrt{3} \end{bmatrix} \mathbf{g}(\varphi) = \begin{bmatrix} 1 \\ \cos(\varphi) \\ \sin(\varphi) \end{bmatrix} . \tag{1.8}
$$

**Fig. 1.8** 2D first-order Ambisonic decoding to 4 loudspeakers

*2D Ambisonic decoding to loudspeakers*. The W, X, and Y channel of 2D firstorder Ambisonics (Double-MS) can easily be played on an arrangement of four loudspeakers, front, back, left, right. While the omnidirectional signal contribution is played by all of the loudspeakers, the figure-of-eight contributions are played outof-phase by the corresponding front-back or left-right pair of loudspeakers, Fig. 1.8.

$$
\begin{bmatrix} F \\ L \\ B \\ R \end{bmatrix} = \begin{bmatrix} 1 & 1 & 0 \\ 1 & 0 & 1 \\ 1 & -1 & 0 \\ 1 & 0 & -1 \end{bmatrix} \begin{bmatrix} W \\ X \\ Y \end{bmatrix} . \tag{1.9}
$$

The decoding weights obviously discretizes the directional pickup characteristics of the Ambisonic channels at the directions of the loudspeaker layout. Consequently, if the loudspeaker layout is more arbitrary and described by the set of its angles {ϕ*l*}, the *sampling decoder* can be given as

$$
\begin{bmatrix} S\_{\varphi\_{\mathbb{I}}} \\ \vdots \\ S\_{\varphi\_{\mathbb{L}}} \end{bmatrix} = \frac{1}{2} \begin{bmatrix} 1 \cos(\varphi\_{\mathbb{I}}) \sin(\varphi\_{\mathbb{I}}) \\ \vdots \\ 1 \cos(\varphi\_{\mathbb{L}}) \sin(\varphi\_{\mathbb{L}}) \end{bmatrix} \begin{bmatrix} W \\ X \\ Y \end{bmatrix} . \tag{1.10}
$$

To achieve a panning-invariant and balanced mapping by this decoder, loudspeakers should be evenly arranged. Moreover, it can be favorable to sharpen the spatial image by attenuating *W* by <sup>√</sup> 1 <sup>3</sup> to map a sound by a supercardioid playback pattern.

*Playback to head-tracked headphones and interactive rotation*. In headphone playback, the headphone signals are generated by convolution with the head-related impulses responses of all four loudspeakers contributing to the left and the right ear signals

$$
\begin{bmatrix} L\_{\text{ear}} \\ R\_{\text{ear}} \end{bmatrix} = \begin{bmatrix} h\_{\text{L}}^{0^{\circ}}(t) \ast h\_{\text{L}}^{90^{\circ}}(t) \ast h\_{\text{L}}^{180^{\circ}}(t) \ast h\_{\text{L}}^{-90^{\circ}}(t) \ast \\ h\_{\text{R}}^{0^{\circ}}(t) \ast h\_{\text{R}}^{90^{\circ}}(t) \ast h\_{\text{R}}^{180^{\circ}}(t) \ast h\_{\text{R}}^{-90^{\circ}}(t) \ast \end{bmatrix} \begin{bmatrix} F \\ L \\ B \\ R \end{bmatrix} . \tag{1.11}
$$

**Fig. 1.9** 2D first-order Ambisonic decoding to head-tracked headphones

To rotate the Ambisonic input scene of the decoder, it is sufficient to obtain a new set of figure-of-eight signals by mixing the X, Y channels with the following matrix depending on the rotation angle ρ, keeping W unaltered

$$
\begin{bmatrix} W \\ \tilde{X} \\ \tilde{Y} \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 \cos \rho & -\sin \rho \\ 0 \sin \rho & \cos \rho \end{bmatrix} \begin{bmatrix} W \\ X \\ Y \end{bmatrix} . \tag{1.12}
$$

This effect is important for head-tracked headphone playback to render the VR/360◦ audio scene static around the listener. A complete playback system is shown in Fig. 1.9. The big advantage of such a system is that rotational updates can be done at high control rates and the HRIRs of the convolver are constant.

#### *1.3.2 3D First-Order Ambisonic Recording and Playback*

The first-order Ambisonic format in 3D consists of a signal W corresponding to an omnidirectional pickup pattern, and three signals (X, Y, and Z) corresponding to figure-of-eight pickup patterns aligned with the Cartesian coordinate axes.

In three dimensions, we cannot work with figure-of-eight patterns described by sin ϕ or cos ϕ of the azimuth angle only, anymore. It is more convenient to describe the arbitrarily oriented figure-of-eight characteristics cos(φ) using the inner product between a variable direction vector (direction of arriving sound) and a fixed direction vector (microphone direction). Direction vectors are of unit length θ = 1 and their inner product corresponds to **θ**<sup>T</sup> <sup>1</sup> θ = cos(φ), where φ is the angle enclosed by the direction of arrival θ and the microphone direction **θ**1. Consequently, a cardioid pickup pattern aiming at **θ**<sup>1</sup> is described by <sup>1</sup> <sup>2</sup> <sup>+</sup> <sup>1</sup> 2 **θ**T <sup>1</sup> θ.

*Native 3D Ambisonic recording (Triple-MS)*. To record the Ambisonic channels W, X, Y, Z, one can use a Triple-MS scheme as shown in Fig. 1.10. With the transposed unit direction vectors representing the aiming of the figure-of-eight channels **θ**T <sup>X</sup> = [1, <sup>0</sup>, <sup>0</sup>], **<sup>θ</sup>**<sup>T</sup> <sup>Y</sup> = [0, <sup>1</sup>, <sup>0</sup>], **<sup>θ</sup>**<sup>T</sup> <sup>Z</sup> = [0, <sup>0</sup>, <sup>1</sup>], to produce the direction dipoles **<sup>θ</sup>**<sup>T</sup> Xθ,

**Fig. 1.10** Native 3D first-order Ambisonic recording with an omnidirectional and three figure-ofeight microphones aligned with the Cartesian axes X, Y, Z

**θ**T <sup>Y</sup>θ, and **θ**<sup>T</sup> Zθ, we can mathematically describe the pickup patterns of native 3D firstorder Ambisonics as

$$\mathbf{g}\_{\rm WXYZ}(\boldsymbol{\theta}) = \begin{bmatrix} 1 \\ \begin{pmatrix} \boldsymbol{\theta}\_{\boldsymbol{X}}^{\mathrm{T}} \\ \boldsymbol{\theta}\_{\boldsymbol{X}}^{\mathrm{T}} \\ \boldsymbol{\theta}\_{\boldsymbol{Z}}^{\mathrm{T}} \end{pmatrix} \boldsymbol{\theta} \end{bmatrix} = \begin{bmatrix} 1 \\ \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \boldsymbol{\theta} \end{bmatrix} = \begin{bmatrix} 1 \\ \boldsymbol{\theta} \end{bmatrix}. \tag{1.13}$$

*3D Ambisonic recording with a tetrahedral arrangement of cardioids*. The principle that worked for three cardioid microphones on the horizon also works for a coincident tetrahedron microphone array of cardioids with the aiming directions FLU-FRD-BLD-BRU, see Fig. 1.11, and [12],

$$\mathbf{g}(\boldsymbol{\theta}) = \frac{1}{2} + \frac{1}{2} \begin{bmatrix} \boldsymbol{\theta}\_{\mathrm{FLU}}^{\mathrm{T}} \\ \boldsymbol{\theta}\_{\mathrm{FRU}}^{\mathrm{T}} \\ \boldsymbol{\theta}\_{\mathrm{BLD}}^{\mathrm{T}} \\ \boldsymbol{\theta}\_{\mathrm{BRU}}^{\mathrm{T}} \end{bmatrix} \boldsymbol{\theta} = \frac{1}{2} + \frac{1}{2} \frac{1}{\sqrt{3}} \begin{bmatrix} 1 & 1 & 1 \\ 1 - 1 & -1 \\ -1 & 1 - 1 \\ -1 & -1 & 1 \end{bmatrix} \boldsymbol{\theta}. \tag{1.14}$$

Encoding is achieved there by the matrix that adds all microphone signals in the first line (W omnidirectional), subtracts back from front microphone signals in the second line (X figure-of-eight), subtracts right from left microphone signals in the third line (Y figure-of-eight), and subtracts down from up microphone signals in the last line (Z figure-of-eight), see also Fig. 1.11,

$$\mathbf{g}\_{\rm WXYZ}(\theta) = \frac{1}{2} \left[ \frac{1}{\sqrt{3}} \begin{pmatrix} 1 & 1 & 1 & 1 \\ 1 & 1 & -1 & -1 \\ 1 & -1 & 1 & -1 \\ 1 & -1 & -1 & 1 \end{pmatrix} \right] \mathbf{g}(\theta). \tag{1.15}$$

**Fig. 1.11** Tetrahedral arrangement of cardioid microphones for 3D first-order Ambisonics and their encoding; microphone capsules point inwards to minimize their spacing

**Fig. 1.12** Practical tetrahedral recording setups with cardioid microphones; the Soundfield SPS200, Oktava MK4012, and Soundfield ST450 offer a fixed 4-capsule geometry. Equally important: Zoom's H3-VR, Røde's NT-SF1, Sennheiser's AMBEO VR Mic

As Fig. 1.12 shows, practical microphone layouts should be as closely spaced as possible. Nevertheless for high frequencies, the microphones cannot be considered coincident anymore, and besides a directional error, there will be a loss of presence in the diffuse field. Typically a shelving filter is used to slightly boost high frequencies. Roughly, a high-shelf filter with a 3 dB boost is sufficient to correct timbral defects at frequencies above which the microphone spacing exceeds half a wavelength, e.g., 5 kHz for a 3.4 cm spacing of the microphones. More advanced strategies are found, e.g., in [7, 13–15].

*3D Ambisonic decoding to loudspeakers*. As before in the 2D case, a *sampling decoder* can be defined that represents the continuous directivity patterns associated with the channels W, X, Y, Z to map the signals to the discrete directions of the loudspeakers. Given the set of loudspeaker directions {**θ***l*} and the unit-vectors to X, Y, Z, the loudspeaker signals of the sampling decoder become

$$
\begin{bmatrix} S\_1 \\ \vdots \\ S\_L \end{bmatrix} = \frac{1}{2} \begin{bmatrix} 1 \ \mathbf{\theta}\_1^\mathrm{T} \mathbf{\theta}\_\mathrm{X} \ \mathbf{\theta}\_1^\mathrm{T} \mathbf{\theta}\_Y \ \mathbf{\theta}\_1^\mathrm{T} \mathbf{\theta}\_Z \\ \vdots \\ 1 \ \mathbf{\theta}\_L^\mathrm{T} \mathbf{\theta}\_\mathrm{X} \ \mathbf{\theta}\_L^\mathrm{T} \mathbf{\theta}\_Y \ \mathbf{\theta}\_L^\mathrm{T} \mathbf{\theta}\_Z \end{bmatrix} \begin{bmatrix} W \\ X \\ Y \\ Z \end{bmatrix} = \underbrace{\frac{1}{2} \begin{bmatrix} 1 \ \mathbf{\theta}\_1^\mathrm{T} \\ \vdots \\ 1 \ \mathbf{\theta}\_L^\mathrm{T} \\ 1 \ \mathbf{\theta}\_L^\mathrm{T} \end{bmatrix}}\_D \begin{bmatrix} W \\ X \\ Y \\ Z \end{bmatrix} . \tag{1.16}
$$

*Equivalent panning function*/*virtual microphone*. The sampling decoder together with the native Ambisonic directivity patterns *g*<sup>T</sup> WXYZ(θ) = [1, <sup>θ</sup><sup>T</sup>] yields the mapping of a signal *s* from the direction θ to the loudspeakers to be

$$
\begin{bmatrix} S\_{\mathrm{I}} \\ \vdots \\ S\_{\mathrm{L}} \end{bmatrix} = \frac{1}{2} \begin{bmatrix} 1 \ \mathsf{\theta}\_{\mathrm{I}}^{\mathrm{T}} \\ \vdots \\ 1 \ \mathsf{\theta}\_{\mathrm{L}}^{\mathrm{T}} \end{bmatrix} \begin{bmatrix} 1 \\ \theta \end{bmatrix} s = \frac{1}{2} \begin{bmatrix} 1 + \mathsf{\theta}\_{\mathrm{I}}^{\mathrm{T}} \theta \\ \vdots \\ 1 + \mathsf{\theta}\_{\mathrm{L}}^{\mathrm{T}} \theta \end{bmatrix} s,\tag{1.17}
$$

This result means that the gain of a source from θ at each loudspeaker **θ***<sup>l</sup>* corresponds to evaluating a cardioid pattern aligned with θ. Consequently, the Ambisonic mapping corresponds to a signal distribution to the loudspeakers using weights obtained by discretization of an Ambisonics-equivalent first-order panning function.

Equivalently, Ambisonic playback using a sampling decoder is comparable to recording each loudspeaker signal with a virtual first-order cardioid microphone aligned with the loudspeaker's direction **θ***<sup>l</sup>* .

It is decisive for a panning-independent loudness mapping and balanced performance that the directions of the loudspeaker layout are well chosen. Also, it can be preferred to reduce the level of the omnidirectional channel *W* by <sup>√</sup> 1 <sup>3</sup> to map a sound by the narrower supercardioid playback pattern instead of a cardioid pattern, which is rather broad.

Decoder design problems were early addressed by Gerzon [16], Malham [17], and Daniel [18]. A current solution for higher-order decoding is given in Sect. 4.9.6 on All-round Ambisonic decoding.

*3D Ambisonic decoding to headphones*. 3D Ambisonic decoding to headphones uses the same approach as for 2D above, except that additional rotational degrees are implemented to compensate for any change in head orientation. Rotation concerns the three directional components X, Y, Z

$$
\begin{bmatrix}
\tilde{X} \\
\tilde{Y} \\
\tilde{Z}
\end{bmatrix} = \mathbf{R}(\alpha, \beta, \gamma) \begin{bmatrix}
X \\
Y \\
Z
\end{bmatrix}.\tag{1.18}
$$

For the definition of the rotation matrix *R*(α, β, γ ) and the meaning of its angles refer to Eq. 5.5 of Sect. 5.2.2. The selection of a suitable set of HRIRs is a question of directional discretization of the 3D directions, as addressed in the decoder above. Signals obtained for virtual loudspeakers are again to be convolved with the corresponding HRIRs for the left and the right ear.

#### **1.4 Practical Free-Software Examples**

The practical examples below show first-order Ambisonic panning a mono sound, decoded to simple loudspeaker layouts. These are either a square layout with 4 loudspeakers at the azimuth angles [0◦, 90◦, 180◦, −90◦] or an octahedral layout with 6 loudspeakers at azimuth [0◦, 90◦, 180◦, −90◦, 0◦, 0◦] and elevation [0◦, 0◦, 0◦, 0◦, 90◦, −90◦].

#### *1.4.1 Pd with Iemmatrix, Iemlib, and Zexy*

Pd is free and it can load and install its extensions from the internet. Required software components are:


Figure 1.13 gives an example for horizontal (2D) first-order Ambisonic panning, decoded to 4 loudspeaker and 2 headphone signals.

Figure 1.14 shows the processing inside the Pd abstraction [FOA\_binaural\_decoder] contained in the Fig. 1.13 example, which uses SADIE database1 subject 1 (KU100 dummy head) HRIRs to render headphone signals.

Figure 1.15 sketches a first-order Ambisonic panning in 3D with decoding to an octahedral loudspeaker layout; master level [multiline∼] and hardware outlets [dac∼] were omitted for easier readability.

#### *1.4.2 Ambix VST Plugins*

This example uses a DAW and ready-to-use VST plug-ins to render first-order Ambisonics. As DAW, we recommend Reaper (reaper.fm) because it nicely facilitates higher-order Ambisonics by allowing tracks of up to 64 channels. Moreover, it is relatively low-priced and there is a fully functional free evaluation version available. You can also use any other DAW that supports VST and sufficiently many multi-track

<sup>1</sup>https://www.york.ac.uk/sadie-project/Resources/SADIEIIDatabase/D1/D1\_HRIR\_WAV.zip.

**Fig. 1.13** First-order 2D encoding and decoding in pure data (Pd) for a square layout

**Fig. 1.14** Binaural rendering to headphones on pure data (Pd) by convolution with SADIE KU100 HRIRs

channels. The example employs the freely available ambiX plug-in suite (http://www. matthiaskronlachner.com/?p=2015), although there exist other Ambisonics plug-ins, especially for first-order.


**Fig. 1.15** First-order 3D encoding and decoding in pure data (Pd) using [mtx\_spherical\_harmonics] for an octahedral layout ([dac∼] omitted for simplicity)

After creating the new track for the virtual source and importing a mono/stereo audio file (per drag-and-drop), the next step is the setup of the track channels. As shown in the table, the virtual source has a single-channel (mono) input and 4 output channels to send the 4 channels of first-order Ambisonics to the Master. The option to send to the Master is activated by default, cf. left in Fig. 1.16. The Master track itself requires 4 input channels and 6 output channels to feed the 6 loudspeakers (right). In Reaper, there is no separate adjustment for input and output channels, thus the Master track has to be set to 6 channels.

In the source track FX, the ambix\_encoder\_o1 can be used to encode the virtual source signal at an arbitrary location on a sphere by inserting the plug-in into the track of the virtual source, cf. its panning GUI in Fig. 1.17. For adding more sources, the track of the virtual source can simply be copied or duplicated. All effects and routing options are maintained for the new tracks.

In order to decode the 4 first-order Ambisonics Master channels to the loudspeakers the ambix\_decoder\_o1 plug-in is added to the Master track. The plug-in requires a preset that defines the decoding matrix and its channel sequence and normalization. For the exemplary octahedral setup with 6 loudspeakers, the following text can be copied to a text file and saved as config-file, e.g., "octahedral.config". The decoder


**Fig. 1.16** 1st-order example in Reaper DAW: routing

**Fig. 1.17** 1st-order example in Reaper DAW: encoder

matrix contains W, -Y, Z, X, with W as constant and -Y, Z, X refer to Cartesian coordinates of the octahedron.

```
#GLOBAL
/coeff_scale n3d
/coeff_seq acn
#END
#DECODERMATRIX
1001
1100
1 -1 0 0
1 0 0 -1
1010
1 0 -1 0
#END
```
**Fig. 1.18** 1st-order example in Reaper DAW: decoder

After loading the preset into the decoder plug-in, the decoder can generate the loudspeaker signals as shown in Fig. 1.18. In the example, the virtual source is panned to the front, resulting in the highest level for loudspeaker 1 (front). The loudspeaker 3 (back) is 12dB quieter because of a side-lobe suppressing super cardioid weighting implied by the switch /coeff\_scale n3d, as *a trick to keep things simple*.

As shown on the SADIE-II website,<sup>2</sup> the SADIE-II head-related impulse responses can be used to rendering Ambisonics to headphones. The listing below shows a configuration file to be used with ambix\_binaural, cf. Fig. 1.19, again using the trick to select n3d to keep the numbers simple and super-cardioid weighting

```
#GLOBAL
/coeff_scale n3d
/coeff_seq acn
#END
#HRTF
44K_16bit/azi_0,0_ele_0,0.wav
44K_16bit/azi_90,0_ele_0,0.wav
44K_16bit/azi_180,0_ele_0,0.wav
44K_16bit/azi_270,0_ele_0,0.wav
44K_16bit/azi_0,0_ele_90,0.wav
44K_16bit/azi_0,0_ele_-90,0.wav
#END
```
<sup>2</sup>https://www.york.ac.uk/sadie-project/ambidec.html.

**Fig. 1.19** 1st-order example in Reaper DAW: binaural decoder


For decoding to less regular loudspeaker layouts, the IEM AllRADecoder<sup>3</sup> permits editing loudspeaker coordinates and automatically calculating a decoder within the plugin. For decoding to headphones, the IEM BinauralDecoder offers a high-quality decoder. The technology behind both plugins is explained in Chap. 4.

In addition to the virtual sources, you can also add a 4-channel recording done with a B-format microphone by placing the 4-channel file in a new track. Reaper will automatically set the number of track channels to 4 and send the channels to the Master. Note that some B-format microphones use a different order and/or weighting of the Ambisonics channels. Simple conversion to the AmbiX-format can be done by inserting the ambix\_converter\_o1 plug-in into the microphone track.

#### **1.5 Motivation of Higher-Order Ambisonics**

*Diffuseness, spaciousness, depth?* Diffuse sound fields are typically characterized by sound arriving randomly from evenly distributed directions at evenly distributed delays. It is practical knowledge that the impression of diffuseness and spaciousness

<sup>3</sup>https://plugins.iem.at/.

requires benefits from decorrelated signals, which is typically achieved by large distances between the microphones rather than by coincident microphones.

Due to the evenness of diffuse sound fields, one would still hope that a low spatial resolution is sufficient to map diffuseness and spatial depth of a room, using coincident microphones or first-order Ambisonics. Nevertheless, high directional correlation during playback destroys this hope and in fact yields a perceptually impeded playback of diffuseness, spaciousness, and depth.

The technical advantages in interactivity and VR as well as the known shortcomings of first-order coincident recording techniques offer enough motivation to increase the directional resolution and go to higher-order Ambisonics, as presented in the subsequent chapters. For professional productions, it is often not sufficient to only rely on first-order coincident microphone recordings. By contrast, higher-order Ambisonics is able to drastically improve the mapping of diffuseness, spaciousness, and depth, as shown in the upcoming chapter about psychoacoustical properties of many-loudspeaker systems.

Recording with a higher-order main microphone array increases the required technological complexity. Nevertheless, digital signal processing and the theory presented in the later chapters is powerful nowadays to achieve this goal.

After all, it seems that delay-based stereophonic recording, such as AB, or equivalence-based recording, such as ORTF, INA5, etc., is often required and wellknown in its mapping properties for spaciousness and diffuseness, correspondingly. What is nice about higher-order Ambisonics: it can make use of these benefits by embedding such recordings appropriately, see Fig. 1.20.

*Facts about higher orders*: Ambisonics extended to higher orders permits a refinement of the directional resolution and hereby improves the mapping of uncorrelated sounds in playback. Figure 1.21a shows the correlation introduced in two neighboring loudspeaker signals when using Ambisonics, given their spacing of 60◦. Given the just noticeable difference (JND) of the inter-aural cross correlation, the figure indicates that an Ambisonic order of ≥3 might be necessary to perceptually preserve decorrelation.

**Fig. 1.20** How is a microphone tree represented in Ambisonics, when it consist of 6 cardioids spaced by 60 cm and 60◦ on a horizontal ring, and a ring of 4 super cardioids spaced by 40 cm and 90◦ as height layer, pointing upwards?

(a) Inter-channel cross correlation of two Ambisonically driven loudspeakers spaced by 60

(b) Perceived depth in dependence of Ambisonics playback order for two listening positions

**Fig. 1.21** Relation between the Ambisonic order, decorrelation, and perceived depth

**Fig. 1.22** Perceptual sweet area of Ambisonic playback from the front at first (light gray), third (dark gray), and fifth order (black). It marks the area in which the perceived direction is plausible, i.e., does not collapse into a single loudspeaker other than C

For this reason, the perception of spatial depth strongly improves when increasing the Ambisonic order from 1 up to 3, Fig. 1.21b. However, this is only the case when seated at the central listening position. Outside this sweet spot, higher orders than 3, e.g., 5, additionally improve the mapping of depth [19]. Therefore, higher-order Ambisonics is important for preserving spatial impressions and when supplying a large audience.

Figure 1.22 shows that the sweet area of perceptually plausible playback increases with the Ambisonic order [20]. With fifth-order Ambisonics, nearly all the area spanned by the horizontal loudspeakers at the IEM CUBE, the 12 × 10 m concert space at our lab, becomes a valid listening area.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 2 Auditory Events of Multi-loudspeaker Playback**

*It is evident that until one knows what information needs to be presented at the listener's ears, no rational system design can proceed.*

Michael A. Gerzon 1976 and AES Vienna [1] 1992.

**Abstract** This chapter describes the perceptual properties of auditory events, the sound images that we localize in terms of direction and width, when distributing a signal with different amplitudes to one or a couple of loudspeakers. These amplitude differences are what methods for amplitude panning implement, and they are also what mapping of any coincident-microphone recording implies when reproduced over the directions of a loudspeaker layout. Therefore several listening experiments on localization are described and analyzed that are essential to understand and model the psychoacoustical properties of amplitude panning on multiple loudspeakers of a 3D audio system. For delay-based recordings or diffuse sounds, there is some relation, however, it is found to be less stable for the desired applications. Moreover, amplitude panning is not only about consistent directional localization. Loudness, spectrum, temporal structure, or the perceived width should be panning-invariant. The chapter also shows experiments and models required to understand and provide those panning-invariant aspects, especially for moving sounds. It concludes with openly-available response data of most of the presented listening experiments.

Starting from classic listening experiments on stereo panning by Leakey [2], Wendt [3], and pairwise horizontal panning by Theile [4], this chapter explores the relevant perceptual properties for 3D amplitude panning and their models. Important experimental studies considered here are for instance those by Simon [5], Kimura [6], F. Wendt [7], Lee [8], Helm [9], and Frank [10, 11]. By the experimental results, it is possible to firmly establish Gerzon's [1] *E*, *r*<sup>E</sup> and *r*E estimators for perceived loudness, direction, and width that apply to most stationary sounds in typical studio and performance environments.

#### **2.1 Loudness**

At a measurement point in the free field, the same signal fed to equalized loudspeakers of exactly the same acoustic distance would superimpose constructively (+6 dB).

In a room with early reflections and a less strict equality of the incoming pair of sounds (typical, slight inaccuracy in loudspeaker/listener position, different mounting situations, different directions in the directivities of ears and loudspeakers), the superposition can be regarded as stochastically constructive (+3 dB) in particular at frequencies that aren't very low.

For the above reasoning, typical amplitude panning rules try to keep the weights distributing the signal to the loudspeakers normalized by root of squares instead of normalizing to the linear sum, in order to obtain constant loudness ([12], VBAP):

$$\mathbf{g}\_l \leftarrow \frac{\mathbf{g}\_l}{\sqrt{\sum\_{l=1}^L \mathbf{g}\_l^2}}.\tag{2.1}$$

*Loudness Model*. If all loudspeakers are equalized, located at the same distance to the listener, and fed by the same signal with different amplitude gains *gl* , a constructive interference could be expected so that the amplitude becomes [1]

$$P = \sum\_{l=1}^{L} \mathbf{g}\_l.\tag{2.2}$$

However, the interference stops to be strictly constructive as soon as the room is not entirely anechoic, the sitting position is not exactly centered, or even for anechoic and centered conditions at high frequencies, when the superposition at the ears cannot be assumed to be purely constructive anymore. Then it is better to assume a less well-defined, stochastic superposition in which a squared amplitude is determined by the sum of the squared weights [1]:

$$E = \sum\_{l=1}^{L} \mathbf{g}\_l^2. \tag{2.3}$$

Therefore, the most common amplitude panning rules use root-squares normalization to obtain a loudness impression that is as constant as possible.

The measure *E* seems to be most useful when designing and evaluating amplitudepanning or coincident microphone techniques. It is not surprising that the ITU-R BS.1770-41 uses the *Leq*(RLB) measure as a loudness model: it is essentially the RMS level after high-pass filtering, cf. [13], which is closely related to the *E* measure detected from loudspeaker signals.

An interesting refinement was proposed by Laitinen et al. [14], which uses a measure *<sup>p</sup>* -<sup>L</sup> *<sup>l</sup>*=<sup>1</sup> *<sup>g</sup><sup>p</sup> <sup>l</sup>* in which the exponent *p* is close to 1 at low frequencies under anechoic conditions and close to 2 at high frequencies/under reverberant conditions.

#### **2.2 Direction**

In the early years of stereophony, researchers investigated the differences in delay times and amplitudes required to control the perceived direction. Below, only experiments are considered that did not use fixation of the listener's head.

#### *2.2.1 Time Differences on Frontal, Horizontal Loudspeaker Pair*

The dissertation of K. Wendt in 1963 [3] shows notably accurate listening experiments done on ±30◦ two-channel stereophony using time delays, in which listeners indicated from where they heard the sounds for each of the tested time differences. H. Lee revisited the properties in 2013 [8], but with musical sound material and an experiment, in which the listener adjusted the time differences until the perceived direction matched the one of a corresponding fixed reference loudspeaker, Fig. 2.1.

The time differences are seldom applicable to reliable angular auditory event placement: auditory images are strongly frequency-dependent (not shown here) and therefore unstable for narrow-band sounds. Leakey and Cherry showed 1957 [2] that time-delay stereophony loses its effect under the presence of background noise.

#### *2.2.2 Level Differences on Frontal, Horizontal Loudspeaker Pair*

K. Wendt's [3] and H. Lee's [8] experiments deliver insights in sound source positioning with ±30◦ two-channel stereophony, however this time with level differences.

As opposed to Fig. 2.1, in which auditory image panning with time differences were characterized by statistical spreads of up to 15◦, level-difference-based panning is clearly smaller in the spread of perceived directions than 10◦, Fig. 2.2.

*Signal dependency*. Wendt [3] described the signal dependency of panning curves on various transient and band-limited sounds, and Lee [8] for musical sounds. A new

<sup>1</sup>https://www.itu.int/rec/R-REC-BS.1770-4-201510-I/en Algorithms to measure audio programme loudness and true-peak audio level (10/2015).

**Fig. 2.1** K. Wendt's experiment [3] used an angular marks helping to specify the localized direction (left). Right shows results for time differences between impulse signals fed to loudspeakers, no head fixation (diagram shows means and standard deviation; the standard deviation was interpolated for the figure). In gray: Results of the time-difference adjustment experiment of Lee [8] using musical material (25, 50, 75% quartiles, symmetrized diagram)

**Fig. 2.2** Wendt's [3] results to crack (impulsive) signals with level differences and without head fixation (the figure shows means and standard deviation; standard deviation was interpolated to plot this figure). In gray: Results of Lee's [8] level-difference adjustment experiment with musical sounds (25, 50, 75% quartiles, symmetrized diagram)

comprehensive investigation on frequency dependency was carried out by Helm and Kurz [9]. With level differences {0, 3, 6, 9, 12} dB and third-octave filtered pulsed pink noise at {125, 250, 500, 1k, 2k, 4k} Hz, they showed that the perceived angle pointed at by the listeners using a motion-tracked pointer was similar between the broad-band case and third-octave bands below 2 kHz. In bands above 2 kHz, smaller level differences cause a larger lateralization, see interpolated curves in Fig. 2.3.

**Fig. 2.3** Panning curve for frontal ±30◦ loudspeaker pair from [9] on the example of the 500 and 4 kHz third-octave band and the slopes for different bands, based on the 3 and 6 dB conditions

#### *2.2.3 Level Differences on Horizontally Surrounding Pairs*

Successive pairwise panning on neighboring loudspeaker pairs is typically used to pan auditory events freely along the loudspeakers of a horizontally surrounding loudspeaker ring. The classical research done specifically targeted at such applications was contributed by Theile and Plenge 1977 [4]. They used a mobile reference loudspeaker with some reference sound that could be moved to match the perceived direction of a loudspeaker pair playing pink noise with level differences at different orientations with respect to the listener's head. There is also the experiment of Pulkki [15] using a level-adjustment task, in which levels were adjusted as to match the auditory event to one of a reference loudspeaker at three different reference directions and for different head orientations. A comprehensive experiment was done by Simon et al. [5], who used a graphical user interface displaying the floor plan of a 45◦-spaced loudspeaker ring to have the listeners specify the perceived direction. Martin et al. in 1999 [16] used a graphical user interface showing the floorplan of a 5.1 ring in their experiment, and last but not least, Matthias Frank used a direct pointing method to enter the perceived direction [10] in one of his experiments.

As the experiments did not seem to yield consistent results, a comprehensive leveldifference adjustment experiment with 24 loudspeakers arranged as a horizontal ring was done in [17] and partially repeated later in [11], see results in Fig. 2.4. In the repeated experiment [11] it became clear that in the anechoic room, a large amount of the differently pronounced localization biases can be avoided by encouraging the listeners to do front-back and left-right head motion by a few of centimeters, whenever there is doubt.

**Fig. 2.4** Medians and 95% confidence intervals for adjusted level differences to align amplitudepanned pink-noise with harmonic complex tone from {±15◦, 0◦}, for **a** frontal and **b** lateral 60◦ stereo pair; **a** uses data from [17] with 4 responses per direction from 5 listeners; **b** used data from [11] with 20 responses per direction. Despite the considerably different spread, frontal and lateral stereo pairs seem to yield pretty much the same tendency

#### *2.2.4 Level Differences on Frontal, Horizontal to Vertical Pairs*

Quite extensively, T. Kimura investigates the localization of auditory events between frontal, vertical ±13.5◦ loudspeaker pairs in 2012 [6, 18]. The work of F. Wendt in 2013 [7, 19] also investigates a slant and vertical loudspeaker pair, Fig. 2.5. Kimura uses pulsed white noise, Wendt uses pulsed pink noise.

Obviously, the horizontal spread is always smaller than the vertical spread and the spread does not align with the direction of the loudspeaker pair. The largest vertical spread appears for the vertical loudspeaker pair.

#### *2.2.5 Vector Models for Horizontal Loudspeaker Pairs*

A weighted sum of the loudspeakers' direction vectors **θ**1, **θ**<sup>2</sup> could be conceived as simple linear model of the perceived direction, using a linear blending parameter 0 ≤ *q* ≤ 1

$$
\sigma = (1 - q)\,\theta\_1 + q\,\theta\_2. \tag{2.4}
$$

The parameter *q* adjusts where the resulting vector *r* is located on the connecting line between **θ**<sup>1</sup> and **θ**2. On frontal loudspeaker pairs, localization curves typically run through the middle direction *<sup>q</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> for level differences of 0 dB. If only one

**Fig. 2.5** Mean values and 95% confidence intervals of the direct-pointing experiments of Kimura (top) with level differences on a vertical ±13.5◦ loudspeaker pair and results of F. Wendt (bottom) on frontally arranged horizontal, slant, and vertical ±20◦ loudspeaker pairs showing two-dimensional 95% confidence (solid) and standard deviation ellipses (dotted)

loudspeakers is active, the result is either of the loudspeaker directions, thus the parameter is *q* = 0 or *q* = 1.

*Classical definitions*. As the simplest choice for *<sup>q</sup>*, one could insert *<sup>q</sup>* <sup>=</sup> *<sup>g</sup>*<sup>2</sup> *<sup>g</sup>*1+*g*<sup>2</sup> or *<sup>q</sup>* <sup>=</sup> *<sup>g</sup>*<sup>2</sup> 2 *g*2 1+*g*<sup>2</sup> 2 to get the vector definitions as weighted average using either the linear or squared gains according to [1]:

$$r\_{\rm V} = \frac{g\_1 \,\theta\_1 + g\_2 \,\theta\_2}{g\_1 + g\_2}, \qquad \qquad \qquad r\_{\rm E} = \frac{g\_1^2 \,\theta\_1 + g\_2^2 \,\theta\_2}{g\_1^2 + g\_2^2}. \tag{2.5}$$

For both models, equal gains *<sup>g</sup>*<sup>1</sup> <sup>=</sup> *<sup>g</sup>*<sup>2</sup> yield *<sup>q</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> , and also the endpoints with *g*<sup>2</sup> = 0 or *g*<sup>1</sup> = 0 correspond to *q* = 0 or *q* = 1, respectively. However, the slope of the *r*<sup>E</sup> vector is steeper than the one of the *r*V. For instance, if *g*<sup>2</sup> = 2 *g*1, the vector *r*<sup>V</sup> lies on *q* = 2/3 of the line between **θ**<sup>1</sup> and **θ**2, while *r*<sup>E</sup> lies at *q* = 4/5 of the connecting line.

The *<sup>r</sup><sup>V</sup>* vector for the±<sup>α</sup> loudspeaker pair at the directions **<sup>θ</sup>**<sup>T</sup> <sup>1</sup>,<sup>2</sup> = (cos α, ± sin α) corresponds to the tangent law [20], whose formal origin lies in a model of summing localization based on a simple model of the ear signals, cf. Appendix A.7. The equivalence of this law to the vector model follows from the tangent tan ϕ as ratio of the *<sup>y</sup>* divided by *<sup>x</sup>* component of the *<sup>r</sup><sup>V</sup>* vector, tan <sup>ϕ</sup> <sup>=</sup> *<sup>g</sup>*<sup>1</sup> sin(α)+*g*<sup>2</sup> sin(−α) *<sup>g</sup>*<sup>1</sup> cos(α)+*g*<sup>2</sup> cos(α) <sup>=</sup> *<sup>g</sup>*1−*g*<sup>2</sup> *<sup>g</sup>*1+*g*<sup>2</sup> tan <sup>α</sup>.

**Fig. 2.6** Fit of the *r*V, *r*E, and *r*<sup>γ</sup> models for **a** third-octave noise on a frontal stereo pair using data from [9], and with data from [11]: **b** pink noise frontal and **c** lateral, cf. Figs. 2.3 and 2.4; **d** horizontal and vertical from [7], Fig. 2.5

*Adjusted slope*. Differently steep curves were fitted by an adjustable-slope model [17]

$$r\_{\varchi} = \frac{|\mathbf{g}\_1|^{\varchi} \mathbf{e}\_1 + |\mathbf{g}\_2|^{\varchi} \mathbf{e}\_2}{|\mathbf{g}\_1|^{\varchi} + |\mathbf{g}\_2|^{\varchi}},\tag{2.6}$$

which uses γ = 1 for *r*<sup>V</sup> and γ = 2 for *r*E. Figure 2.6 compares the prediction by *r*V, *r*E, and *r*<sup>γ</sup> to frequency-dependently perceived directions in frontal horizontal pairs, to perceived directions in a lateral stereo pair, and to perceived directions in a frontal pair that is either horizontal or vertical, using various studies mentioned above.

*Practical choice r*E. While a specific exponent γ closely fitting the experimental data may vary, a constant value is preferable. Figure 2.6 indicates that in most cases focusing on *r*<sup>E</sup> is reasonable and sufficiently precise, see also [11].

**Fig. 2.7** Indirect level-adjustment experiment of Pulkki [21] shows the spread and mean of the adjusted VBAP angles for frontal loudspeaker triplets, and the experiments of F. Wendt [7, 19] use a direct pointing method to obtain results in the shape of two-dimensional 95% confidence (solid) and standard deviation ellipses (dotted) for {−∞, 0, +11.71} dB for the top loudspeaker (left diagram), or the right loudspeaker (center diagram) respectively, or {−∞, 0, +11.51} dB for the bottom loudspeaker (right)

#### *2.2.6 Level Differences on Frontal Loudspeaker Triangles*

V. Pulkki [21] and F. Wendt [7, 19] investigated localization properties for frontal loudspeaker triplets with level differences, see Fig. 2.7. Both used pulsed pink noise in their experiments.

While V. Pulkki used an indirect adjustment task to evaluate VBAP control angles to obtain auditory events directionally matching the respective reference loudspeakers, F. Wendt uses a direct pointing method. Wendt's experiments indicate that loudspeaker triplets with three different azimuthal positions yield a smaller spread in the indicated direction than such with vertical loudspeaker pairs (not the case in Pulkki's experiments).

#### *2.2.7 Level Differences on Frontal Loudspeaker Rectangles*

F.Wendt [7, 19] moreover presents experiments about frontal loudspeaker rectangles, again using a pointer method and pulsed pink noise, Fig. 2.8.

Again it seems that arrangements avoiding vertical loudspeaker pairs exhibit a smaller statistical spread in the responses.

**Fig. 2.8** Wendt's experiments about frontal loudspeaker rectangles showing two-dimensional 95% confidence (solid) and standard deviation ellipses (dotted). The experimental setup of this and above-mentioned experiments is shown. Left: each of the corner loudspeakers is raised once by +6 dB in level, right: both left/right loudspeaker levels are raised once by {+3, +6} dB, and both top/bottom pairs are once raised by +6 dB

#### *2.2.8 Vector Model for More than 2 Loudspeakers*

For more than two active loudspeakers and in 3D, a vector model based on the exponent γ = 2 yields the *r*<sup>E</sup> vector [1]

$$r\_{\rm E} = \frac{\sum\_{l=1}^{\rm L} g\_l^2 \,\theta\_l}{\sum\_{l=1}^{\rm L} g\_l^2}. \tag{2.7}$$

#### *2.2.9 Vector Model for Off-Center Listening Positions*

At off-center listening positions, the distances to the loudspeakers are not equal anymore, resulting in additional attenuation and delay for each loudspeaker depending on the position. For stationary sounds, this effect can be incorporated into the energy vector by additional weights *w*<sup>r</sup>,*<sup>i</sup>* and *w*<sup>τ</sup>,*<sup>i</sup>*

#### 2.2 Direction 33

$$\mathbf{r}\_{\rm E} = \frac{\sum\_{l=1}^{\rm L} (w\_{\rm \rm ,l} \,\, w\_{\rm ,l} \,\, g\_l)^2 \,\theta\_l}{\sum\_{l=1}^{\rm L} (w\_{\rm ,l} \,\, w\_{\rm ,l} \,\, g\_l)^2}. \tag{2.8}$$

The weight *w*r,*<sup>l</sup>* models the attenuation of a point-source-like propagation <sup>1</sup> *<sup>r</sup>* . The reference distance is the distance to the closest loudspeaker at the evaluated listening position, thus the weight of each loudspeaker results in

$$
\omega\_{\mathbf{r},l} = \frac{1}{r\_l}.\tag{2.9}
$$

The incorporation of delays into the energy vector requires a transformation that yields the weights *w*τ,*<sup>l</sup>* for each loudspeaker. It is reasonable that these weights attenuate the lagging signals in order to reduce their influence on the predicted direction. An attenuation of <sup>1</sup> 4 dB ms is known from the echo threshold in [22], similarly [23], and has successfully been applied for the prediction of localization in rooms [24]. The weight of each loudspeaker is calculated as <sup>τ</sup>*<sup>l</sup>* <sup>=</sup> *<sup>c</sup> rl* in seconds at the listening position under test

$$\omega\_{\mathfrak{r},l} = 10^{\frac{-1000}{4\times30}\mathfrak{r}}.\tag{2.10}$$

Further weights can be applied in order to model the precedence effect in more detail, as proposed by Stitt [25, 26]. Listening test results in [27] compared the differently complex extensions of the energy vector and revealed that the simple weighting with *w*r,*<sup>i</sup>* and *w*τ,*<sup>i</sup>* is sufficient for a rough prediction of the perceived direction in typical playback scenarios.

The left side of Fig. 2.9 shows the predicted directions by the energy vector for various listening positions when playing back the same signal on a standard stereo loudspeaker pair with a radius of 2.5 m. The absolute localization error can be calculated from the difference of the predicted direction and the desired panning direction. The right side of Fig. 2.9 depicts areas with localization errors within 4 ranges: 0◦ ... 10◦ (white, perfect localization), 10◦ ... 30◦ (light gray, plausible localization), 30◦ ... 90◦ (gray, rough localization), and >90◦ (dark gray, poor localization).

Concerning a single playback scenario, i.e. a single panning direction on a loudspeaker setup, the perceptual sweet area for plausible playback can be estimated by the area with localization errors below 30◦. For the prediction of a more general sweet area, the absolute localization errors can be computed for all possible panning directions in a fine grid of 1◦ and averaged at each listening position as shown in Fig. 2.10.

**Fig. 2.9** Predictions of perceived directions by the energy vector for different listening positions in a standard stereo setup with two loudspeakers playing the same signal. Gray-scale areas on the right indicate listening areas with predicted absolute localization errors within different angular ranges

#### **2.3 Width**

M. Frank [10] investigated the auditory source width for frontal loudspeaker pairs with 0 dB level difference and various aperture angles, as well as the influence of an additional center loudspeaker on the auditory source width. The response was given by reading numbers off a left-right symmetric scale written on the loudspeaker arrangement (Fig. 2.11).

Figure 2.11 (right) shows the statistical analysis of the responses. Obviously the additional center loudspeaker decreases the auditory source width.

Auditory source with is difficult to compare for different directions and also single loudspeakers yield auditory source widths that vary with direction. Still, a relatively constant auditory source width is desirable for moving auditory events. For static auditory events, the narrowest-possible extent can be desirable.

**Fig. 2.11** Experimental setup and results of experiments of M. Frank (confidence intervals) about auditory source width of frontal stereo pairs of the angles ±5◦,..., ±40◦ and with an additional center loudspeaker (C)

#### *2.3.1 Model of the Perceived Width*

The angle 2 arccos *r*E describes the aperture of a cap cut off the unit sphere perpendicular to the *r*<sup>E</sup> vector, at its tip, from the origin, see Fig. 2.12. As the *r*<sup>E</sup> vector length is between 0 (unclear direction) and 1 (only one loudspeaker active), this angle stays between 180◦ and 0◦.

M. Frank's experiments about the auditory source width [10, 28] showed that stereo pairs of larger half angles α were also heard as wider. The length of the *r*<sup>E</sup> vector gets shorter with the half angle α. In a symmetrical loudspeaker pair **θ**T <sup>12</sup> = (cos α, ± sin α) with *g*<sup>1</sup> = *g*<sup>2</sup> = 1, the *y* coordinate of the *r*<sup>E</sup> vector cancels and its length is

$$\|r\_{\mathrm{E}}\| = r\_{\mathrm{E},\mathrm{x}} = \cos \alpha.$$

The corresponding spherical cap is same size as the loudspeaker pair 2 arccos *r*E- = 2α. However, only <sup>5</sup> <sup>8</sup> of the size was indicated by the listeners of the experiments, which yields the following estimator of the perceived width:

$$ASW = \frac{\mathfrak{s}}{8} \cdot \frac{180^{\circ}}{\pi} \cdot 2 \arccos \|\mathfrak{r}\_{\mathbb{E}}\|. \tag{2.11}$$

**Fig. 2.12** Cap size associated with *r*<sup>E</sup> length model for L+R (left plot) and L+R+C (right plot)

For an additional center loudspeaker *<sup>g</sup>*<sup>3</sup> <sup>=</sup> 1, **<sup>θ</sup>**<sup>T</sup> <sup>=</sup> (1, <sup>0</sup>), the estimator yields

$$\|r\_{\mathrm{E}}\| = r\_{\mathrm{E},\mathrm{x}} = \frac{1}{3} + \frac{2}{3}\cos\alpha,$$

an increase matching the experiments as arccos *r*E-< α, see Figs. 2.13 and 2.12.

#### **2.4 Coloration**

Despite research primarily focuses on the spatial fidelity of multi-loudspeaker playback, the overall quality of surround sound playback was found to be largely determined by timbral fidelity (70%) [29]. Loudspeakers in a studio or performance space are often characterized by different colorations that are caused by different reflection patterns (most often the wall behind the loudspeaker). When changing the active loudspeakers, or their number, these differences become audible. On the one hand, static coloration, e.g. the frequency responses of the loudspeakers, can typically be equalized. On the other hand, changes in coloration during the movement of a source cannot be equalized easily and yield annoying comb filters.

Although coloration is often assessed verbally [30], we employ a simple technical predictor based on the composite loudness level (CLL) by Ono [31, 32]. The CLL spectrum predicts the perceived coloration and is calculated from the sum of the loudnesses of both ears in each third-octave band. Studies about loudspeaker and headphone equalization show that differences in third-octave band levels of less than 1dB are inaudible by most listeners [33, 34]. This criterion can also be applied for the perception of coloration, i.e., differences between CLL spectra of less than 1dB are assumed to be inaudible.

Pairwise panning between loudspeakers results in a single active loudspeaker for source directions that coincide with the direction of a loudspeaker and two equally loud loudspeakers for source directions exactly between two neighboring loudspeakers, cf. Fig. 2.14. In the second case, the different propagation paths from the two loudspeakers to the ears create a comb filter. This comb filter is not present for sources

**Fig. 2.14** Coloration predicted by composite loudness levels for a single loudspeaker C (black), two equally loud loudspeakers C and R (light gray), and their difference (dashed dark gray)

played from a single loudspeaker. Thus, moving a source between the two directions yields noticeable coloration. This is in contrast to static sources, for which Theile's experiments [35] indicated that they are perceived without coloration.

The actual shape of the afore-mentioned comb filter depends on the angular distance between the loudspeakers. The first notch and its depth decreases with the distance. This implies that coloration increases for playback with higher loudspeaker densities.

A similar comb filter is created when using a triplet of loudspeakers with the same loudspeaker density as the pair, e.g. L, C, R compared to C, R. In order to avoid a strong increase in source width or annoying phasing effects, the outmost loudspeakers L and R are strongly reduced in their level, typically around -12dB compared to loudspeaker C. In doing so, the similarity of the comb filters yields barely any coloration when moving a source between the two directions, cf. Fig. 2.15.

Judging from what is shown above, it appears beneficial to activate always a few loudspeaker to stabilize the coloration, as opposed to using just one loudspeaker and moving the playback to another one. Keeping the number of simultaneously active loudspeakers more or less constant does not only prevent coloration of source movements, it also yields a more constant source width. Because of this relation between coloration and source width, the fluctuation of *r*E is also a simple predictor of panning-dependent coloration.

In general, the strongest coloration is perceived under anechoic listening conditions. In reverberant rooms, the additional comb filters introduced by reflections help to conceal the comb filters due to multi-loudspeaker playback.

**Fig. 2.15** Coloration predicted by composite loudness levels for loudspeaker C with additional – 12 dB from L and R (black), two equally loud loudspeakers C and R (light gray), and their difference (dashed dark gray)

#### **2.5 Open Listening Experiment Data**

Experimental data from azimuthal localization in frontal and lateral loudspeaker pairs Figs. 2.3 and 2.4, azimuthal/elevational localization in horizontal, skew, and vertical frontal pairs Fig. 2.5, triangles Fig. 2.7, and quadrilaterals Fig. 2.8 are available online at https://opendata.iem.at in the listening experiment data project, as well as the data to the width experiment in Fig. 2.11.

The opendata.iem.at listening experiment data project contains evaluation routines to analyze the 95%-confidence intervals symmetrically based on means, standard deviations and the inverse Student's t-distribution CIMEAN.m, or more robustly based on median and inter-quartile ranges CI2.m and Student's t-distribution, or for twodimensional data analysis robust\_multivariate\_confidence\_region.m. The MATLAB script plot\_gathered\_data.m reads the formatted listening experiment data and its exemplary code generates figures like the above.

In order to support others providing own listening experiment data, the MATLAB functions write\_experimental\_data.m read\_experimental\_data.m are provided on the website.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 3 Amplitude Panning Using Vector Bases**

*The method is straightforward and can be used on many occasions succesfully.*

Ville Pulkki [1], Ph.D. Thesis, 2001.

**Abstract** This chapter describes Ville Pulkki's famous vector-base amplitude panning (VBAP) as the most robust and generic algorithm of amplitude panning that works on nearly any surrounding loudspeaker layout. VBAP activates the smallestpossible number of loudspeakers, which gives a directionally robust auditory event localization for virtual sound sources, but it can also cause fluctuations in width and coloration for moving sources. Multiple-direction amplitude panning (MDAP) proposed by Pulkki is a modification that increases the number of activated loudspeakers. In this way, more direction-independence is achieved at the cost of an increased perceived source width and reduced localization accuracy at off-center positions. As vector-base panning methods rely on convex hull triangulation, irregular loudspeaker layouts yielding degenerate vector bases can become a problem. Imaginary loudspeaker insertion and downmix is shown as robust method improving the behavior, in particular for smaller surround-with-height loudspeaker layouts. The chapter concludes with some practical examples using free software tools that accomplish amplitude panning on vector bases.

Vector-base amplitude panning (VBAP) was extensively described and investigated in [2], alongside with the stabilization of moving sources by adding spread with multiple-direction amplitude panning (MDPA) [3]. Since then, VBAP and MDAP have been becoming the most common and popular amplitude panning techniques, which is particularly robust and can automatically adapt to specific playback layouts.

#### **3.1 Vector-Base Amplitude Panning (VBAP)**

Assuming the *r*<sup>V</sup> model to predict the perceived direction, an intended auditory event at a panning direction *θ*, we call it the *virtual source*, can theoretically be controlled by the criterion according to V. Pulkki [2]

$$\boldsymbol{\theta} = \sum\_{l=1}^{L} \tilde{\boldsymbol{g}}\_l \,\boldsymbol{\theta}\_l. \tag{3.1}$$

Here, **θ***<sup>l</sup>* are the direction vectors of the loudspeakers involved and the amplitude weights *g*˜*<sup>l</sup>* need to be normalized for constant loudness

$$\mathbf{g}\_l = \frac{\bar{\mathbf{g}}\_l}{\sqrt{\sum\_{l=1}^L \tilde{\mathbf{g}}\_l^2}}.\tag{3.2}$$

Moreover, the weights *gl* should always stay positive to avoid in-head localization or other irritating listening experiences. For loudspeaker rings around the horizon, always 1 or 2 loudspeakers will be contributing to the auditory event, for loudspeakers arranged on a surrounding sphere, always 1 up to 3 loudspeakers will be used, whose directions must enclose the direction of the desired auditory event, the virtal source. For the directional stability of the auditory event, the angle enclosed between the loudspeakers should stay smaller than 90◦.

The system of equations for VBAP [2] uses 3 loudspeaker directions and gains to model the panning direction *θ*

$$\boldsymbol{\theta} = [\boldsymbol{\theta}\_1, \boldsymbol{\theta}\_2, \boldsymbol{\theta}\_3] \begin{bmatrix} \tilde{\boldsymbol{g}}\_1 \\ \tilde{\boldsymbol{g}}\_2 \\ \tilde{\boldsymbol{g}}\_3 \end{bmatrix} = \mathbf{L} \cdot \tilde{\mathbf{g}} \qquad \Rightarrow \tilde{\mathbf{g}} = \mathbf{L}^{-1} \boldsymbol{\theta}, \qquad \mathbf{g} = \frac{\tilde{\mathbf{g}}}{\|\tilde{\mathbf{g}}\|}. \tag{3.3}$$

The selection of the activated loudspeaker triplet is preceded by forming all triplets of the convex hull spanned by all the given playback loudspeakers. To find the loudspeaker triplet that needs to be activated, the list of all triplets is being searched for the one with all-positive weights, *g*<sup>1</sup> ≥ 0, *g*<sup>2</sup> ≥ 0, *g*<sup>3</sup> ≥ 0.

Figure 3.1 shows the localization curve for VBAP between a loudspeaker at 0◦ and 45◦ for a centrally seated listener and one shifted to the left. The experiment is described in [4] and results were gathered by a 1.8 m circle of 8 loudspeakers, and listeners indicated the perceived direction by naming numbers from a 5◦ scale mounted on the loudspeaker setup. Black whiskers of the results (95% confidence intervals and medians) for the centrally seated listener indicate a mismatch between slope of the perceived angles with VBAP; the ideal curve is represented by the dashed line and the mismatch can be understood by a better match of other exponents γ in Fig. 2.6. The directional spread is quite narrow. For an off-center left-shifted listening position the perceived directions is shown in terms of a 5◦ histogram (gray bubbles) in Fig. 3.1. For this off-center position, it becomes clear that the closest loudspeaker dominates localization within a third of the panning directions. Still, the directional

**Fig. 3.1** Perceived directions for VBAP between loudspeakers at 0◦ and 45◦ from [4]. 95% confidence intervals and medians (black) are for a centrally seated listener in a circle of 2.5m radius. Localization for left-shifted listener (1.25m) can become bi-modal, so that 5◦ bubble histogram is shown (gray)

mapping seems to be monotonic with the panning angle, and the perceived direction stays within the loudspeaker pair, which is a robust result, at least.

In Fig. 3.2 we see that responses from [5] in which the panning angle was adjusted to match reference loudspeakers set up in steps of 15◦ on amplitude-panned lateral loudspeaker pairs fairly match the reference directions using VBAP. The *r*<sup>E</sup> vector model (black curve) delivers a better match with only one exception at 105◦. This motivates VBIP as alternative strategy.

**Fig. 3.2** VBAP angles on a 60◦-spaced horizontal loudspeaker ring starting at 0◦ (**a**) or 30◦ (**b**), perceptually adjusted to match panned pink noise with harmonic-complex acoustic reference in 15◦ steps, from [5]; black curve shows *r*<sup>E</sup> model prediction

**Fig. 3.3** The width measure 2 arccos *r*E for a virtual source on a horizontal and a vertical trajectory (45◦ azimuth) using VBAP on an octahedral arrangement

*Vector-Base Intensity Panning (VBIP)*. With nearly the same set of equations, but improving the perceptual mapping by the squares of the weights, the auditory event can be controlled corresponding to the direction of the *r*<sup>E</sup> vector

$$\boldsymbol{\theta} = [\boldsymbol{\theta}\_1, \ \boldsymbol{\theta}\_2, \ \boldsymbol{\theta}\_3] \begin{bmatrix} \tilde{\boldsymbol{g}}\_1^2 \\ \tilde{\boldsymbol{g}}\_2^2 \\ \tilde{\boldsymbol{g}}\_3^2 \end{bmatrix} = \mathbf{L} \cdot \tilde{\mathbf{g}}\_{\text{sq}} \quad \Rightarrow \tilde{\mathbf{g}}\_{\text{sq}} = \mathbf{L}^{-1} \, \boldsymbol{\theta}, \quad \tilde{\mathbf{g}} = \begin{bmatrix} \sqrt{\tilde{\boldsymbol{g}}\_{\text{sq}}} \, \boldsymbol{\epsilon} \\ \sqrt{\tilde{\boldsymbol{g}}\_{\text{sq}}} \, \boldsymbol{\epsilon} \\ \sqrt{\tilde{\boldsymbol{g}}\_{\text{sq}}} \end{bmatrix}, \quad \mathbf{g} = \frac{\tilde{\mathbf{g}}}{\|\tilde{\mathbf{g}}\|}. \tag{3.4}$$

This formulation appears more contemporary due to the excellent match of the *r*<sup>E</sup> model to predict experimental results, as shown earlier.

*Non-smooth VBAP/VBIP width*. If one of the loudspeakers is exactly aligned with the virtual source for either VBAP or VBIP, e.g. **θ**<sup>1</sup> = *θ*, the resulting gains are *g*1,2,<sup>3</sup> = (1, 0, 0), and therefore only 1 loudspeaker will be activated. For a virtual source between the 2 loudspeakers, e.g. **θ**<sup>1</sup> + **θ**<sup>2</sup> ∝ *θ*, then we obtain *g*1,2,<sup>3</sup> = (1, <sup>1</sup>, <sup>0</sup>)/√2, and hereby only 2 loudspeakers will be active. This behavior in particular yields audible variation in the perceived width and coloration. For virtual source movements that cross a common edge of neighboring loudspeaker triplets, there will often be unexpectedly intense jumps that are quite pronounced.

Figure 3.3 illustrates the variation of the perceived width with VBAP on an octahedral arrangements of loudspeakers in the directions **θ**<sup>T</sup> *<sup>l</sup>* ∈ {[±1, 0, 0],[0, ±1, 0], [0, 0, ±1]}.

#### **3.2 Multiple-Direction Amplitude Panning (MDAP)**

In order to adjust the *r*<sup>E</sup> or *r*<sup>V</sup> vector not only directionally but also in length, and thus to control the number of active loudspeakers for moving sound objects, Pulkki extended VBAP to multiple-direction amplitude panning (MDAP [3]). Hereby not only the perceived width but also the coloration can be held constant.

*Direction spread in MDAP*. MDAP employs more than one virtual source distributed around the panning direction as a directional spreading strategy. For horizontal loudspeaker rings, MDAP can consist of a pair of virtual VBAP sources at the angle ±α around the panning direction ϕ<sup>s</sup> ± α. In a ring of L loudspeakers with uniform angular spacing of <sup>360</sup>◦ <sup>L</sup> , the angle <sup>α</sup> <sup>=</sup> 90%180◦ <sup>L</sup> yields optimally flat width for all panning directions, as shown for L = 6 in comparison between MDAP and VBAP in Fig. 3.4. Moreover, MDAP seems to equalize the aiming of the *r*<sup>E</sup> measure to the aiming of the *r*<sup>V</sup> measure, which is the one controlled by VBAP and MDAP.

*Listening experiment results*. Experiments from [4] in Fig. 3.5 investigate the perceived width for two possible horizontal loudspeaker ring layouts, both with 45◦ spacings, but one starting at 0◦ ("0") the other at 22.5◦ ("1/2"). Widths of MDAP with a direction spread of α = 22.5◦ are perceived as significantly similar on both

ring layouts, while VBAP yields significantly narrower results for panning onto the frontal loudspeaker in the "0" layout, which activates a single loudspeaker, only. Note that VBAP1/2 and MDAP1/2 are identical with α = 22.5◦ and were treated as one condition.

Moreover, a more constant width measure also describes a more constant number of activated loudspeakers while panning. Figure 3.6 shows that listeners can hear the difference in coloration changes with rotatory panning using pink noise and a constant speed. The figure shows that coloration fluctuations of MDAP are always clearly smaller than with VBAP on similar loudspeaker rings. Moreover, coloration changes are more pronounced on rings of 16 loudspeakers than with 8 loudspeakers, which is explained by their faster fluctuation.

Figure 3.7 shows the results from [6] for a central and left-shifted off-center listening position when using MDAP on an 8-channel ring of loudspeakers. At the central listening position, the perceived directional spread around the loudspeaker positions 0◦ and 45◦, obviously increases as expected, as indicated by the whiskers (95% confidence intervals and medians). Moreover, the spread of MDAP seems to slightly decrease the slope mismatch between the underlying VBAP algorithm and the perceptual curve around the 22.5◦ direction.

Despite MDAP enforces a larger number of active loudspeakers, its localization is still similarly robust as the one of VBAP, also at on off-center listening positions. The perceived direction can be assumed to stay at least confined within a strictly directionally limited activation of loudspeakers. Correspondingly, the perceived directions shown in the gray 5◦-histogram bubbles of Fig. 3.7 indicate the perceived directions when the listener is located left-shifted off-center. While localization is slightly attracted by the closer loudspeaker at 0◦, the larger spread causes a more monotonic outcome that is less split than with VBAP in Fig. 3.1.

For a more exhaustive study, Frank used 6 loudspeakers on the horizon and gave the task to his listeners to align an MDAP pink-noise direction to match acoustical references every 15◦ (harmonic complex) by adjusting the panning direction [5]. The results in Fig. 3.8 contain 24 answers from 6 subjects responding four times

**Fig. 3.7** Perceived directions for MDAP panning on an 8-channel 2.5m radius loudspeaker ring within the interval [0◦, 45◦] at a central (black medians and 95%- confidence whiskers) and 1.25 m left-shifted off-center listening position (gray 5◦ bubble histogram); dashed line indicates ideal panning curve

**Fig. 3.8** MDAP pink-noise directions on horizontal rings of 60◦-spaced loudspeakers adjusted to perceptually match reference loudspeaker directions (harmonic complex) every 15◦. Markers and whiskers indicate 95% confidence interals and medians, black curve the *r*<sup>E</sup> vector model

(by repetition and symmetrization). The black line shows directions indicated by the *r*<sup>E</sup> vector model for the tested conditions. Obviously, the confidence intervals of the adjusted MDAP angles match quite well both the reference directions and predictions by the *r*<sup>E</sup> vector model, in particular for angles between 0◦ and 90◦ (except 75◦) for the ring starting at 0◦, and from 0◦ to 120◦ for the 30◦-rotated ring. The mismatch is much less than 4◦ for panning angles ≤ 90◦.

**Fig. 3.9** The width measure 2 arccos *r*E for virtual sources on a horizontal and vertical path on an octahedron setup using MDAP with additional 8 half-amplitude virtual sources at 45◦ distance to the main virtual source

*MDAP with 3D loudspeaker layouts*. For more arbitrary 3D loudspeaker arrangements, multiple-directions could be arranged ring-like, see Fig. 3.9. This arrangement uses 8 additional virtual sources inclined by 45◦ wrt. the main virtual source.

At least mathematically, however, it requires to post optimize the amplitudes and angles of the virtual sources in order to accurately match the desired *r*<sup>V</sup> or *r*<sup>E</sup> vector in direction and length on irregular loudspeaker arrangements, cf. [7]. Non-uniform *r*<sup>V</sup> vector lengths of the individual virtual sources involved cause a distorted resultant vector. In particular, their superposition is distorted towards those of the multiple virtual source directions with the longest *r*<sup>V</sup> vectors. Epain's article [7] proposes optimization retrieving optimal orientation and weighting of the multiple virtual sources for every panning direction.

#### **3.3 Challenges in 3D Triangulation: Imaginary Loudspeaker Insertion and Downmix**

Surrounding loudspeaker hemispheres typically exhibit the following two problems, in most cases:


The problem of unfavorable or ambiguous triangulations into loudspeaker triplets appears subtle, however, it can cause clearly audible deficiencies. Especially when ambiguous triangulation yields asymmetric behavior between left and right, e.g., for the top, rear, and lateral directions, where we would manually define loudspeaker quadrilaterals instead of triangles, see [9].

As surrounding loudspeaker hemispheres are typically open by 180◦ towards below, VBAP/ VBIP/ MDAP is numerically unstable and theoretically useless for any panning direction below. Despite the absence of loudspeakers below renders downwards amplitude panning theroretically infeasible, it is still reasonable to preserve signals of virtual sources that are meant for playback on spherically surrounding setups.

In the case of the asymmetric loudspeaker rectangles, see Fig. 3.10, and a missing lower hemisphere of surrounding loudspeakers, the insertion of one or more *imaginary loudspeakers* in the vertical direction (nadir) or in the middle of the rectangle (the average direction vector) has proven to be a useful strategy, e.g. in [10]. Any imaginary loudspeaker aims at either extending the admissible triangulation towards open parts of the surround loudspeaker setup, or to cover for parts with potential asymmetry, see [9].

**Fig. 3.10** VBAP on the ITU D (4 + 5 + 0) setup [8]. *Top row:* Insertion of imaginary loudspeaker at nadir preserves loudness of downward-panned signals, shown for vertical path and *E* values in dB for factors { <sup>√</sup><sup>1</sup> 5 , <sup>1</sup> 2 √ 5 , 0} to re-distribute the signal to the 5 existing horizontal loudspeakers. *Middle row:* Due to typical triangulation, two left-right mirrored vertical paths (45◦ azimuth) yield asymmetric behavior, as shown by the 2 arccos *r*E measure. *Bottom row:* Insertion of imaginary loudspeaker at 65◦ fixes symmetry and feeds the signal with a factor <sup>√</sup><sup>1</sup> <sup>4</sup> to the 4 existing neighbor loudspeakers

The signal of the imaginary loudspeaker can be dealt with in two ways


#### **3.4 Practical Free-Software Examples**

#### *3.4.1 VBAP/MDAP Object for Pd*

There is a classic VBAP/MDAP implementation by Ville Pulkki that is available as external in pure data (Pd). The example in Fig. 3.11 illustrates its use together with some other useful externals in Pd. Software requirements are:

**Fig. 3.11** Vector-Base/Multi-Direction Amplitude Panning (VBAP/MDAP) example in pure data (Pd) using Pulkki's [vbap] external for an octahedral layout


#### *3.4.2 SPARTA Panner Plugin*

The SPARTA Panner under http://research.spa.aalto.fi/projects/sparta\_vsts/plugins.html provides a vector-base amplitude panning interface (VBAP) and multiple-direction amplitude panning (MDAP), see Fig. 3.12, with frequency-dependent loudness normalization by *<sup>p</sup>* <sup>L</sup> *<sup>l</sup>*=<sup>0</sup> *<sup>g</sup><sup>p</sup> <sup>l</sup>* adjustable to the listening conditions, see Laitinen [11].

The parameter DTT can be varied between 0 (standard, frequency-independent VBAP normalization, i.e. diffuse-field normalization), 0.5 for typical listening environments, and 1 for the anechoic chamber. The plugin allows to either manually enter the azimuth and elevation angles of multiple panning directions (if more than one input signal is used) and for the playback loudspeakers, or import/export from/to preset files. Of course all panning directions can be time-varying and be moved per mouse, automations, or controls.

**Fig. 3.12** The Panner VST plug-in from Aalto University's SPARTA plug-in suite manages Vector-Base Amplitude Panning within sequencers supporting VST

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 4 Ambisonic Amplitude Panning and Decoding in Higher Orders**

*…the second-order Ambisonic system offers improved imaging over a wider area than the first-order system and is suitable for larger rooms.*

Jeffrey S. Bamford [1], Canadian Acoustics, 1994.

**Abstract** Already in the 1970s, the idea of using continuous harmonic functions of scalable resolution was described by Cooper and then Gerzon, who introduced the name Ambisonics. This chapter starts by reviewing properties of first-order horizontal Ambisonics, using an interpretation in terms of panning functions. And the required mathematical formulations for 3D higher-order Ambisonics are developed here, with the idea to improve the directional resolution. Based on this formalism, ideal loudspeaker layouts can be defined for constant loudness, localization, and width, according to the previous models. The chapter discusses how Ambisonics can be decoded to less ideal, typical loudspeaker setups for studios, concerts, sound-reinforcement systems, and to headphones. The behavior is analyzed by a rich variety of listening experiments and for various decoding applications. The chapter concludes with example applications using free software tools.

Cooper [2] used higher-order angular harmonics to formulate circular panning of auditory events. Due to the work of Felgett [3], Gerzon [4], and Craven [5], the term Ambisonics became common for technology using spherical harmonic functions. Around the early 2000s, most notably Bamford [6], Malham [7], Poletti [8], Jot [9], and Daniel [10] pioneered the development of higher-order Ambisonic panning and decoding, Ward and Abhayapala [11], Dickens [12], and at the lab of the authors Sontacchi [13].

Another leap happened around 2010, when Ambisonic decoding to loudspeakers could be largely improved by considering regularization methods [14], singularvalue decomposition [15], and all-round Ambisonic decoding (AllRAD) [15, 16], a combination of vector-base panning techniques with Ambisonics, yielding the most robust and flexible higher-order decoding method known today.

For headphones, after the work of Jot [9] that outlined the basic problems of binaural decoding in the 1990s, Sun, Bernschütz, Ben-Hur, and Brinkmann [17–19] made important contributions to binaural decoding, and we consider TAC and MagLS decoders by Zaunschirm and Schörkhuber [20, 21] as the essential binaural decoders. Both remove HRTF delays or optimize HRTF phases at high frequencies to avoid spectral artifacts. By interaural covariance correction, MagLS/TAC manage to play back diffuse fields consistently, using the formalism of Vilkamo et al [22].

#### **4.1 Direction Spread in First-Order 2D Ambisonics**

In 2D first-order Ambisonics as discussed in Chap. 1, the directional mapping of a single sound source from the angle ϕ<sup>s</sup> to the direction of each loudspeaker ϕ is described by the shape of panning function (or direction-spread function) in Eq. (1.17). The directional spreading is not infinitely narrow, but determined by what can be represented by first-order directivity patterns. Consequently, sound from the angle ϕ<sup>s</sup> will be mapped by a dipole pattern aligned with the source and an additional omnidirectional pattern.We can involve a spread parameter *a* to make the directional spread to the loudspeakers system adjustable and either cardioid-shaped *a* = 1, 2D-supercardioid-shaped *<sup>a</sup>* <sup>=</sup> <sup>√</sup>2, or 2D-hypercardioid-shaped *<sup>a</sup>* <sup>=</sup> 2, using:

$$\log(\varphi) = 1 + a \cos(\varphi - \varphi\_{\mathbb{S}}).\tag{4.1}$$

This function represents how first-order Ambisonic panning would distribute a mono signal to loudspeakers. With the loudspeaker positions described by the set of angles {ϕ*l*}, a vector of amplitude-panning gains with an entry for each loudspeaker could be determined by sampling the direction-spread function:

$$\mathbf{g} = \begin{bmatrix} \mathbf{g}\_1 \\ \vdots \\ \mathbf{g}\_L \end{bmatrix} = 1 + a \begin{bmatrix} \cos(\varphi\_1 - \varphi\_\mathcal{S}) \\ \vdots \\ \cos(\varphi\_L - \varphi\_\mathcal{S}) \end{bmatrix}. \tag{4.2}$$

With these gain values, we evaluate models of perceived loudness, direction, and width, as introduced in Chap. 2, in order to enter a discussion of perceptual goals.

If the loudspeaker directions {**θ***l*} are chosen suitably, it is possible to obtain panning-independent loudness, direction, and width measures *E* = *<sup>l</sup> g*<sup>2</sup> *<sup>l</sup>* , *<sup>r</sup>*<sup>E</sup> <sup>=</sup> <sup>1</sup> *E <sup>l</sup> g*<sup>2</sup> *<sup>l</sup>* **θ***<sup>l</sup>* , and <sup>5</sup> 8 180◦ <sup>π</sup> 2 arccos *r*E. How is it done?

For first-order 2D Ambisonics, it is theoretically optimal to use at least a ring of 4 loudspeakers with uniform angular spacing and *<sup>a</sup>* <sup>=</sup> <sup>√</sup>2, which is easily checked with the aid of a computer, cf. Fig. 4.1, and explained below and in Sect. 4.4.

*Direction spread in FOA*. The panning-function interpretation with its directional spread has some similarity to MDAP, with its attempt to directionally spread an amplitude-panned signal. Similar to the discrete virtual spread by ±α = arccos *r*E

around the panning direction. The virtual direction spread of first-order Ambisonics is described by its continuous panning function *g*(ϕ) in Eq. (4.1). To inspect the continuous function by the *r*<sup>E</sup> measure defined in Eq. (2.7), we may evaluate an integral over the panning function instead of the sum. Because of the symmetry around ϕs, we may set for convenience ϕ<sup>s</sup> = 0, which knowingly causes *r*E,<sup>y</sup> = 0, and evaluate

$$r\_{\mathrm{E,x}} = \frac{\int\_0^{2\pi} g^2(\varphi) \cos \varphi \,\mathrm{d}\varphi}{\int\_0^{2\pi} g^2(\varphi) \,\mathrm{d}\varphi} = \frac{\int\_0^{\pi} [1 + 2a \cos \varphi + a^2 \cos^2 \varphi] \cos \varphi \,\mathrm{d}\varphi}{\int\_0^{\pi} [1 + 2a \cos \varphi + a^2 \cos^2 \varphi] \,\mathrm{d}\varphi} \quad = \frac{a}{1 + \frac{a^2}{2}}.\tag{4.3}$$

The maximum of *<sup>r</sup>*E,<sup>x</sup> <sup>=</sup> <sup>2</sup>*<sup>a</sup>* <sup>2</sup>+*a*<sup>2</sup> is found by <sup>d</sup> <sup>d</sup>*<sup>a</sup> <sup>r</sup>*E,<sup>x</sup> <sup>=</sup> <sup>4</sup>+2*a*2−4*a*<sup>2</sup> <sup>2</sup>+*a*<sup>2</sup> <sup>=</sup> 0, hence at *<sup>a</sup>* <sup>=</sup> <sup>√</sup>2. Consequently, the 2D max-*r*<sup>E</sup> weight is *<sup>r</sup>*E,<sup>x</sup> <sup>=</sup> <sup>√</sup><sup>2</sup> <sup>2</sup> = <sup>√</sup> 1 <sup>2</sup> and yields the angle arccos *r*E = 45◦. This would resemble a 2D-MDAP-equivalent source spread to ±45◦. Note that first-order Ambisonics cannot map to a smaller spread than this. Only higher orders permit to further reduce this spread to a desired angle below 90◦.

*Ideal loudspeaker layouts*. Not only is the directional aiming of the virtual, continuous first-order Ambisonic panning function ideal and its width panning-invariant, also its loudness measure is panning-invariant. However, decoding to a physical loudspeaker setup can degrade the ideal behavior. For which loudspeaker layout are these properties preserved by sampling decoding?

The 2D first-order Ambisonic components (*W*, *X*, *Y* ) correspond to {1, cos ϕ, sin ϕ} patterns, a first-order Fourier series in the angle. Sampling the playback directions by L = 3 uniformly spaced loudspeakers on the horizon, the sampling theorem for this series is already fulfilled. Accordingly, Parseval's theorem ensures panninginvariant loudness *E* for any panning direction.

For an ideal *r*<sup>E</sup> measure, however, one more loudspeaker is required L ≥ 4 for a uniformly spaced horizontal ring. To explain this increase exhaustively, the concept of circular/spherical polynomials and *t*-designs will be introduced in this chapter. For a brief explanation, *g*<sup>2</sup>(ϕ) is a second-order expression and therefore to represent the ideal constant loudness *E* = *g*<sup>2</sup>(ϕ) dϕ of the continuous panning function consistently after discretization *<sup>E</sup>* <sup>=</sup> <sup>2</sup><sup>π</sup> L *<sup>l</sup> g*<sup>2</sup> *<sup>l</sup>* , it requires L = 3 uniformly spaced loudspeakers, as argued before. By contrast, the expressions *g*2(ϕ) cos ϕ and *g*2(ϕ)sin ϕ are third-order and appear in *r*<sup>E</sup> · *E* = *g*2(ϕ)[cos ϕ, sin ϕ] Tdϕ. Consequently, ideal mapping of *r*<sup>E</sup> (direction and width) requires at least one more loudspeaker L = 4 for a uniformly spaced arrangement to make the continuous and the discretized form *<sup>r</sup>*<sup>E</sup> *<sup>E</sup>* <sup>=</sup> <sup>2</sup><sup>π</sup> L *<sup>l</sup> g*<sup>2</sup> *<sup>l</sup>* [cos ϕ*l*, sin ϕ*l*] <sup>T</sup> perfectly equal.

*Towards a higher-order panning function*. An Nth-order cardioid pattern is obtained from the cardioid pattern by taking its Nth power

$$\mathbf{g\_N}(\varphi) = \frac{1}{2^N} (\mathbf{l} + \cos \varphi)^N,$$

which makes it narrower. With N <sup>=</sup> 2, this becomes, using cos<sup>2</sup> <sup>ϕ</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> (1 + cos 2ϕ),

$$\log\_2(\varphi) = \frac{1}{4}(1 + 2\cos\varphi + \cos^2\varphi) = \frac{1}{8}(3 + 4\cos\varphi + \cos2\varphi).$$

More generally, Chebyshev polynomials *Tm*(cos ϕ) = cos *m*ϕ, cf. [23, Eq. 3.11.6] can be used to argue that there is always a fully equivalent cosine series describing the higher-order 2D panning function in the azimuth angle

$$\mathrm{g}(\varphi) = \sum\_{m=0}^{N} a\_m \cos m\varphi. \tag{4.4}$$

*Rotated panning function*. In first-order Ambisonics, panning functions consist of an omnidirectional part, cos(0ϕ) = 1, and a figure-of-eight to *x*, cos ϕ, but that was not all: Recording and playback also required a figure-of-eight pattern to *y*, sin ϕ. The additional component allows to express rotated first-order directivities by a basis set of fixed directivities. For higher orders, a panning function rotated to a non-zero aiming ϕ<sup>s</sup> = 0

$$\log(\varphi - \varphi\_{\rm s}) = \sum\_{m=0}^{N} a\_m \cos[m(\varphi - \varphi\_{\rm s})] \tag{4.5}$$

can be re-expressed by the addition theorem cos(α + β) = cos α cos β − sin α sin β into a series involving the sinusoids (odd symmetric part of a Fourier series),

$$\begin{split} g(\varphi - \varphi\_s) &= \sum\_{m=0}^{N} a\_m \left[ \cos m\varphi\_s \cos m\varphi + \sin m\varphi\_s \sin m\varphi \right] \\ &= \sum\_{n=0}^{N} a\_m^{(c)} \cos m\varphi + \sum\_{m=0}^{N} a\_m^{(s)} \sin m\varphi. \end{split} \tag{4.6}$$

We conclude: *Higher-order Ambisonics in 2D (and the associated set of theoretical microphone directivities) is based on the Fourier series in the azimuth angle* ϕ.

#### **4.2 Higher-Order Polynomials and Harmonics**

The previous section required that direction and length of the *r*<sup>E</sup> vector resulting from amplitude panning on loudspeakers matched the desired auditory event direction and width. Harmonic functions with strict symmetry around a panning direction *θ*<sup>s</sup> will help us in achieving this goal and in defining good sampling.

Regardless of the dimensions, be it in 2D or 3D, we desire to define continuous and resolution-limited axisymmetric functions around the panning direction **θ**<sup>s</sup> to fulfill our perceptual goals of a panning-invariant loudness *E*, width *r*E, and perfect alignment between panning direction **θ**<sup>s</sup> and localized direction *r*E. Then we hope to find suitable directional discretization schemes for ideal loudspeaker layouts, so that the measures *E* and *r*<sup>E</sup> are perfectly reconstructed in playback.

The projection of a variable direction vector *θ* onto the panning direction **θ**<sup>s</sup> always yields the cosine of the enclosed angle **θ**<sup>T</sup> <sup>s</sup> *θ* = cos φ, no matter whether it is in two or three dimensions. Hereby constructing the panning function based on this projection readily meets the desired goals. The *m*th power thereof, (**θ**<sup>T</sup> <sup>s</sup> *θ*)*<sup>m</sup>* = cos*<sup>m</sup>* φ helps to build an Nth-order power series *<sup>g</sup>* <sup>=</sup> <sup>N</sup> *<sup>m</sup>*=<sup>0</sup> *am*(**θ**<sup>T</sup> <sup>s</sup> *θ*)*<sup>m</sup>* to describe a virtual Ambisonic panning function.

For 2D, such a *circular polynomial g* <sup>=</sup> <sup>N</sup> *<sup>m</sup>*=<sup>0</sup> *am*(**θ**<sup>T</sup> <sup>s</sup> *θ*)*<sup>m</sup>* contains all (N + <sup>1</sup>)(<sup>N</sup> <sup>+</sup> <sup>2</sup>)/2 mixed powers by (**θ**<sup>T</sup> <sup>s</sup> *<sup>θ</sup>*)*<sup>m</sup>* <sup>=</sup> (θxsθ<sup>x</sup> <sup>+</sup> <sup>θ</sup>ysθy)*<sup>m</sup>* <sup>=</sup> *<sup>m</sup> k*=0 *m k* (θxsθx)*<sup>k</sup>* (θysθy)*<sup>m</sup>*−*<sup>k</sup>* of the direction vectors' entries **θ**<sup>T</sup> <sup>s</sup> = [θxs, θys] and *θ* = [θx, θy] T. However, we could already recognize that it only takes 2N + 1 functions to express *<sup>g</sup>* <sup>=</sup> <sup>N</sup> *<sup>m</sup>*=<sup>0</sup> *am*(**θ**<sup>T</sup> <sup>s</sup> *<sup>θ</sup>*)*<sup>m</sup>* <sup>=</sup> <sup>N</sup> *<sup>m</sup>*=<sup>0</sup> *am* cos*<sup>m</sup>* <sup>φ</sup>: First an initial polynomial with relative azimuth φ = ϕ − ϕ<sup>s</sup> relating to a harmonic series of N + 1 cosines or Chebyshevpolynomials *<sup>g</sup>* <sup>=</sup> <sup>N</sup> *<sup>m</sup>*=<sup>0</sup> *bm* cos *<sup>m</sup>*<sup>φ</sup> <sup>=</sup> <sup>N</sup> *<sup>m</sup>*=<sup>0</sup> *bm Tm*(**θ**s*θ*). Then, in terms of absolute azimuth ϕ, the trigonometric addition theorem re-expresses the series into one of N + 1 cosines and N sines, with *Tm*(**θ**s*θ*) = cos[*m*(ϕ − ϕs)] = cos *m*ϕ<sup>s</sup> cos *m*ϕ + sin *m*ϕ<sup>s</sup> sin *m*ϕ. As shown in the upcoming section, we can alternatively obtain such orthonormal harmonic functions by solving a second-order differential equation that is generally used to define harmonics, which bears the later benefit that we can use the approach to define spherical harmonics in three space dimensions.

*Spherical polynomials* are similar, *<sup>g</sup>* <sup>=</sup> <sup>N</sup> *<sup>n</sup>*=<sup>0</sup> *an* (**θ**<sup>T</sup> <sup>s</sup> *θ*)*<sup>n</sup>*, involving the expressions (**θ**<sup>T</sup> <sup>s</sup> *<sup>θ</sup>*)*<sup>n</sup>* <sup>=</sup> (θxsθ<sup>x</sup> <sup>+</sup> <sup>θ</sup>ysθ<sup>y</sup> <sup>+</sup> <sup>θ</sup>zsθz)*<sup>n</sup>* <sup>=</sup> *<sup>n</sup> k*=0 *<sup>n</sup>*−*<sup>k</sup> l*=0 *n k <sup>l</sup> n*−*k* (θzsθz)*<sup>k</sup>* (θxsθx) *l* (θysθy)*<sup>n</sup>*−*k*−*<sup>l</sup>* . Again, all these (N + 1)(N + 2)(N + 3)/6 combinations would be too many to form an orthogonal set of basis functions. Moreover, while the different cosine harmonics are orthogonal axisymmetric functions in 2D, they are not in 3D. On the sphere, the N + 1 orthogonal Legendre polynomials *Pn*(cos φ) replace the cosine series as a basis for *<sup>g</sup>* <sup>=</sup> <sup>N</sup> *<sup>n</sup>*=<sup>0</sup> *cn Pn*(cos φ), as shown below. All mathematical derivations for the sphere rely on the definition of harmonics. They result in (N + 1)<sup>2</sup> spherical harmonics and their addition theorem as a basis in terms of absolute directions <sup>2</sup>*n*+<sup>1</sup> <sup>4</sup><sup>π</sup> *Pn*(**θ**<sup>T</sup> <sup>s</sup> *<sup>θ</sup>*) <sup>=</sup> *<sup>n</sup> <sup>m</sup>*=−*<sup>n</sup> <sup>Y</sup> <sup>m</sup> <sup>n</sup>* (**θ***s*)*Y <sup>m</sup> <sup>n</sup>* (*θ*). Dickins' thesis is interesting for further reading [12].

In both regimes, 2D and 3D, the circular or spherical polynomials concept will be used to determine optimal layouts, so-called *t*-designs. Such *t*-designs are directional sampling grids that are able to keep the information about the constant part of any either circular (2D) or spherical (3D) polynomials up to the order N ≤ *t*. This will be a mathematical key property exploited to determine requirements for preserving *E* and *r*<sup>E</sup> measures during Ambisonic playback with optimal loudspeaker setups, but not only. Also *t*-designs simplify numerical integration of circular or spherical harmonics to define state-of-the-art Ambisonic decoders or mapping effects.

#### **4.3 Angular/Directional Harmonics in 2D and 3D**

The Laplacian is defined in the D-dimensional Cartesian space as

$$
\Delta = \sum\_{j=1}^{D} \frac{\partial^2}{\partial \boldsymbol{x}\_j^2},
\tag{4.7}
$$

and for any function *f* , the Laplacian *f* describes the curvature. Any harmonic function is proportional to its curvature by an eigenvalue λ,

$$
\Delta f = -\lambda \, f,\tag{4.8}
$$

and therefore is an oscillatory function. Generally, eigensolutions *f* = −λ *f* to the Laplacian are called *harmonics*. For suitable eigenvalues λ, harmonics span an orthogonal set of basis functions that are typically used for Fourier expansion on a finite interval. It seems desirable to find such harmonics for functions only exhibiting directional dependencies, i.e. in the azimuth angle ϕ in 2D, and azimuth and zenith angle ϕ,ϑ in 3D.

#### **4.4 Panning with Circular Harmonics in 2D**

For 2 dimensions Appendix A.3.2 uses the generalized chain rule to convert the Laplacian of a 2D coordinate system = <sup>∂</sup><sup>2</sup> <sup>∂</sup>*x*<sup>2</sup> <sup>+</sup> <sup>∂</sup><sup>2</sup> <sup>∂</sup>*y*<sup>2</sup> to a polar coordinate system with the radius *<sup>r</sup>* and the angle <sup>ϕ</sup> to the *<sup>x</sup>* axis, = <sup>1</sup> *r* ∂ <sup>∂</sup>*<sup>r</sup>* <sup>+</sup> <sup>∂</sup><sup>2</sup> <sup>∂</sup>*<sup>r</sup>* <sup>2</sup> <sup>+</sup> <sup>1</sup> *r* 2 ∂2 ∂ϕ<sup>2</sup> . And for functions = (ϕ) purely in the angle ϕ, the radial derivatives of all vanish and it remains (∂ → d)

$$\frac{\mathrm{d}^2}{\mathrm{d}\varphi^2}\Phi = -\lambda \, r^2 \, \Phi. \tag{4.9}$$

It is only yielding useful solutions with <sup>λ</sup> *<sup>r</sup>* <sup>2</sup> <sup>=</sup> *<sup>m</sup>*2, *<sup>m</sup>* <sup>∈</sup> <sup>Z</sup>, cf. Appendix A.3.4, Fig. 4.2,

**Fig. 4.2** Circular harmonics with *m* = −3,..., 3 plotted as polar diagram using the radius *R* = 20 lg | <sup>√</sup>π*m*<sup>|</sup> and grayscale to distinguish between positive (gray) and negative (black) signs

$$\Phi\_m = \frac{1}{\sqrt{2\pi}} \begin{cases} \sqrt{2}\sin(|m|\varphi), & \text{for } m < 0, \\ 1, & \text{for } m = 0, \\ \sqrt{2}\cos(m\varphi), & \text{for } m > 0, \end{cases} \tag{4.10}$$

which defines how to decompose panning functions of limited order |*m*| < N. The harmonics are periodic in azimuth, orthogonal and normalized (orthonormal) on the period −π ≤ ϕ ≤ π. Due to their completeness, any square-integrable function *g*(ϕ) can be expanded into a series of the harmonics using coefficients γ*<sup>m</sup>*

$$\log(\varphi) = \sum\_{m=-\infty}^{\infty} \chi\_m \Phi\_m(\varphi). \tag{4.11}$$

For a known function *g*(ϕ), the coefficients γ*<sup>m</sup>* are obtained by the transformation integral

$$\gamma\_m = \int\_{-\pi}^{\pi} \mathbf{g}\left(\boldsymbol{\varphi}\right) \Phi\_m(\boldsymbol{\varphi}) \, \mathrm{d}\boldsymbol{\varphi},\tag{4.12}$$

as shown in Appendix Eq. (A.14).

*2D panning function*. An infinitely narrow angular range around a desired direction |ϕ − ϕs| < ε → 0 is represented by the transformation integral over a Dirac delta distribution δ(ϕ − ϕ*s*), cf. Appendix Eq. (A.16), so that the coefficients of such a panning function are

$$
\gamma\_m = \Phi\_m(\varphi\_s). \tag{4.13}
$$

As the infinite circular harmonic series is complete, the panning function is

$$\log(\varphi) = \sum\_{m=-\infty}^{\infty} \Phi\_m(\varphi\_\mathbf{s}) \,\Phi\_m(\varphi) = \delta(\varphi - \varphi\_\mathbf{s}),\tag{4.14}$$

and in practice we resolution-limit it to the Nth Ambisonic order, |*m*| ≤ N, and use an additional weight *am* that allows us to design its side lobes

**Fig. 4.3** 2D unweighted *an* = 1 basic and weighted max-*r*<sup>E</sup> Ambisonic panning functions for the orders N = 1, 2, 5

$$\log\_N(\varphi) = \sum\_{m=-N}^{N} a\_m \, \Phi\_m(\varphi\_\mathbf{s}) \, \Phi\_m(\varphi). \tag{4.15}$$

The max-*r*<sup>E</sup> panning function [24] uses the weights *am* <sup>=</sup> cos( <sup>π</sup> *<sup>m</sup>* <sup>2</sup>(N+1)), as derived in Appendix Eq. (A.20). The spread is now adjustable by the order to <sup>±</sup> <sup>90</sup>◦ <sup>N</sup>+<sup>1</sup> . The result is shown in Fig. 4.3, compared with no side-lobe suppression when *an* = 1 (basic).

It is easy to recognize: *<sup>m</sup>*(ϕ*s*) represents the recorded or encoded directions, and *<sup>m</sup>*(ϕ) represents the decoded playback directions.

*Optimal sampling of the 2D panning function*. In the theory of circular/spherical polynomials in the variable ζ = cos(ϕ − ϕs), so-called *t*-designs in 2D are optimal point sets of given angles {ϕ*l*} with *l* = 1,..., L and size L. A *t*-design allows to perfectly compute the integral (constant part) over the polynomials P*m*(ζ ) of limited degree *m* ≤ *t* by discrete summation

$$\int\_{-\pi}^{\pi} \mathcal{P}\_m(\cos \phi) \,\mathrm{d}\phi = \sum\_{l=1}^{L} \mathcal{P}\_m[\cos(\varphi\_l - \varphi\_s)] \,\frac{2\pi}{\mathcal{L}},\tag{4.16}$$

regardless of any angular shift ϕs. In 2D, Chebyshev polynomials *Tm*(cos φ) = cos(*m*φ) are orthogonal polynomials, therefore an Nth-order panning function composed out of cos(*m*φ) is always a polynomial of Nth degree. Knowing this, it is clear that the integral over *g*<sup>2</sup> <sup>N</sup> required to evaluate the loudness measure *E* is a polynomial of the order 2N. The integral to calculate *r*<sup>E</sup> is over *g*<sup>2</sup> <sup>N</sup> cos(φ) and thus of the order 2N + 1. In playback, to get a perfectly panning-invariant loudness measure *E* of the continuous panning function and also the perfectly oriented *r*<sup>E</sup> vector of constant spread arccos *r*E, the parameter *t* must be *t* ≥ 2N + 1. In 2D, all regular polygons are *t*-designs with L = *t* + 1 points

$$
\varphi\_l = \frac{2\pi}{t+1}(l-1). \tag{4.17}
$$

We can use the smallest set of 2N <sup>+</sup> 2 angles <sup>ϕ</sup>*<sup>l</sup>* <sup>=</sup> <sup>180</sup>◦ <sup>N</sup>+<sup>1</sup> (*<sup>l</sup>* <sup>−</sup> <sup>1</sup>) as optimal 2D layout.

#### **4.5 Ambisonics Encoding and Optimal Decoding in 2D**

To encode a signal *s* into Ambisonic signals χ*m*, we multiply the signal with the encoder representing the direction of the signal at the angle ϕ<sup>s</sup> by the weights *<sup>m</sup>*(ϕs)

$$\left(\chi\_{\mathfrak{m}}(t) = \Phi\_{\mathfrak{m}}(\varphi\_{\mathfrak{s}}) \,\,\mathrm{s}(t),\tag{4.18}$$

or in vector notation

$$\mathbf{x}\_{\rm N} = \mathbf{y}\_{\rm N}(\varphi\_{\rm s}) \,\mathrm{s},\tag{4.19}$$

using the column vector *y*<sup>N</sup> = [<sup>−</sup><sup>N</sup>(ϕs), . . . , N(ϕs)] <sup>T</sup> of 2N + 1 components. The Ambisonic signals in *χ*<sup>N</sup> are weighted by side-lobe suppressing weights *a*<sup>N</sup> = [*a*|−N|, ..., *a*N] T, expressed by the multiplication with a diagonal matrix diag{*a*N}, and then decoded to the L loudspeaker signals *x* by a sampling decoder

$$\mathcal{D} = \sqrt{\frac{2\pi}{\mathbb{L}}} \begin{bmatrix} \mathbf{y}\_{\mathrm{N}}(\boldsymbol{\varphi}\_{\mathrm{l}}), \ \dots, \ \mathbf{y}\_{\mathrm{N}}(\boldsymbol{\varphi}\_{\mathrm{L}}) \end{bmatrix}^{\mathrm{T}} = \sqrt{\frac{2\pi}{\mathbb{L}}} \begin{bmatrix} \mathbf{y}\_{\mathrm{N}}^{\mathrm{T}} \end{bmatrix} , \tag{4.20}$$

using

$$\mathbf{x} = \mathbf{D}\operatorname{diag}\{\mathbf{a}\_{\mathrm{N}}\}\,\mathbf{x}\_{\mathrm{N}}.\tag{4.21}$$

In total, the system for encoding and decoding can also be written to yield a set of loudspeaker gains for one virtual source

$$\mathbf{g} = \mathbf{D}\operatorname{diag}\{\mathbf{a}\_{\mathrm{N}}\} \,\mathbf{y}\_{\mathrm{N}}(\varphi\_{\mathrm{s}}),\tag{4.22}$$

or in particular for the 2D sampling decoder *g* = 2<sup>π</sup> <sup>L</sup> *<sup>Y</sup>*<sup>T</sup> <sup>N</sup> diag{*a*N} *y*N(ϕs).

#### **4.6 Listening Experiments on 2D Ambisonics**

There are several listening experiments discussing the features of Ambisonics, most of which are summarized in [25], which will be discussed complemented with those from [26] below.

The perceptually adjusted panning angle of 2nd-order max-*r*<sup>E</sup> Ambisonics panning on 6 horizontal loudspeakers matches quite well the acoustic reference direction as shown in Fig. 4.4, similar to MDAP in Fig. 3.8, but with a slightly more

**Fig. 4.4** 2nd-order max-*r*E-weighted Ambisonic panning with pink-noise on horizontal rings of 60◦-spaced loudspeakers adjusted to perceptually match reference loudspeaker directions (harmonic complex) every 15◦. Markers and whiskers indicate 95% confidence intervals and medians, black curve the *r*<sup>E</sup> vector model

**Fig. 4.5** In Frank's 2008 pointing experiment [27] on center and off-center listening seats for 3 virtual sources (*A, B, C*) using 1st-order (left) and 5th-order (right) Ambisonics on 12 horizontal loudspeakers (IEM CUBE) indicate a more stable localization with high orders. Moreover, for 5th-order, max-*r*<sup>E</sup> weighting and omission of delay compensation were preferred. Omission of max-*r*<sup>E</sup> weights ("basic") or alternative "in-phase" weights that entirely suppresses any side lobe yield less precise localization at off-center listening positions

**Fig. 4.6** Experiments on an off-center position in **a** show that max-*r*<sup>E</sup> outperforms the basic, rectangularly truncated Fourier series at off-center listening positions, **b** where it can avoid splitting of the auditory event. Stitt's experiments **c** imply that localization with higher orders is more stable and that the localization deficiency at off-center listening seats seems to be proportional to the ratio between distance to the center divided by radius of the loudspeaker ring, and not the specific time-delays that are larger for large loudspeaker rings, cf. [28]

accurate median by 0.5◦ on average, and in particular at side and back panning directions.

Another aspect to investigate is how stable the results are for center and offcenter listening seats as shown in Fig. 4.5. It illustrates that max-*r*<sup>E</sup> with the highest order achieves the best stability with regard to localization at off-center listening seats. Astonishingly, the delay compensation for non-uniform delay times to the center deteriorated the results, most probably because of the nearly linear frontal

**Fig. 4.7** Predicted sweet area sizes using the *r*<sup>E</sup> model Sect. 2.2.9 for loudspeaker layouts and playback orders used in Stitt's experiments [28]: first order (top), third order (bottom), small (left), and large (right)

arrangement of loudspeakers that is more robust to lateral shifts of the listening positions than a circular arrangement.

Figure 4.6a, b shows the direction histogram for two different weightings *am*, and it illustrates that proper sidelobe suppression of the panning function by using max-*r*<sup>E</sup> weights is decisive at shifted listening positions to avoid splitting of the auditory image, as it appears in Fig. 4.6b without the weights (basic).

Peter Stitt's work shows that the localization offsets at off-center listening seats do not increase with the radius of the loudspeaker arrangement as long as the off-center seat stays in proportion to the radius, Fig. 4.6c. The result are predicted by the sweet area model from Sect. 2.2.9 for the first order (top row) and third order (bottom row) in Fig. 4.7, with both sizes small setup (left) and large setup (right).

**Fig. 4.8** The perceptual sweet spot size as investigated by Frank [29] is nearly covering the entire area enclosed by the IEM CUBE as a playback setup (black = 5th, gray = 3rd, light gray = 1st order Ambisonics). It is smallest for 1st-order Ambisonics

Frank's 2016 experiments [29] used scales on the floor from which listeners read off where the sweet area ends in every radial direction, cf. Fig. 4.8a. For Fig. 4.8b, the criterion for listeners to indicate leaving the sweet area was when the frontally panned sound was mapped outside the loudspeaker pairs L, C, and R. It showed that a sweet area providing perceptually plausible playback measures at least <sup>2</sup> <sup>3</sup> of the radius of the loudspeaker setup if the order is high enough.

The perceived width of auditory events is investigated in the experimental results of Fig. 4.9, [25], in which pink noise was frontally panned in different orientations of the loudspeaker ring (with one loudspeaker in front, with front direction lying quarter- and half-spaced wrt. loudspeaker spacing). Listeners compared the width of multiple stimuli, and the results were expected to indicate constant width for the differently rotated loudspeaker ring, as the optimal arrangement with L = 2N + 2

provides constant *r*<sup>E</sup> length. The panning-invariant length is not perfectly reflected in the perceived widths with 3rd order on 8 loudspeakers, for which the on-loudspeaker position is perceived as being significantly wider. By contrast, the high-order experiment with 7th order on 16 loudspeakers would perfectly validate the model.

Figure 4.10 shows experiments investigating the time-variant change in sound coloration for a pink-noise virtual source rotating at a speed of 100◦/s, and for different Ambisonic panning setups. There is an obvious advantage of a reduced fluctuation in coloration at both listening positions, centered and off-center, when using the side-lobe-suppressing "max-*r*E" weighting instead of the "basic" rectangular truncation of the Fourier series. At the off-center listening position, max-*r*<sup>E</sup> weights achieve good results with regard to constant coloration for both 3rd and 7th order arrangements with 8 and 16 loudspeakers that were investigated.

*How well would diffuse signals be preserved played back?* All the above experiments deal with how non-diffuse signals are presented. To complement what is shown in Fig. 1.21 of Chap. 1 with an explanation, the relation between Ambisonic order and its ability to preserve diffuse fields is estimated here by the covariance between uncorrelated directions. Assume a max-*r*E-weighted Nth-order Ambisonic panning function *g*(*θ*<sup>T</sup> <sup>s</sup> *θ*) that is normalized to *g*(1) = 1, encodes two sounds *s*<sup>1</sup>,<sup>2</sup> from two directions *θ* <sup>1</sup> and *θ* 2, with the sounds being uncorrelated and unit-variance *E*{*s*1*s*2} = δ<sup>1</sup>,2. We can find that the Ambisonic representation mixes the sounds at their respective mapped directions and yields an increase of their correlation *x*<sup>1</sup> = *s*<sup>1</sup> + *g*<sup>12</sup> *s*<sup>2</sup> and *x*<sup>2</sup> = *s*<sup>2</sup> + *g*<sup>12</sup> *s*1, using *g*<sup>12</sup> = *g*(cos φ),

$$\begin{split} R\{\mathbf{x}\_{1}\mathbf{x}\_{2}\} &= \frac{E\{\mathbf{x}\_{1}\mathbf{x}\_{2}\}}{\sqrt{E\{\mathbf{x}\_{1}^{2}\}E\{\mathbf{x}\_{2}^{2}\}}} \\ &= \frac{E\{(1+\mathbf{g}\_{12}^{2})\mathbf{s}\_{1}\mathbf{s}\_{2}+\mathbf{g}\_{12}(\mathbf{s}\_{1}^{2}+\mathbf{s}\_{2}^{2})\}}{\sqrt{E\{\mathbf{s}\_{1}\mathbf{s}\_{2}\}E\{\mathbf{s}\_{1}\mathbf{s}\_{2}\}+\mathbf{s}\_{2}^{2}(\mathbf{s}\_{1}^{2}+\mathbf{s}\_{2}^{2})\}} = \frac{2\mathbf{g}\_{12}}{\mathbf{s}\_{1}+\mathbf{s}\_{2}^{2}}. \end{split} \tag{4.23}$$

$$\eta = \frac{E\{(1+\mathcal{g}\_{12}^2)\,\mathrm{s}\_1\,\mathrm{s}\_2 + \mathcal{g}\_{12}(\mathrm{s}\_1^2 + \mathrm{s}\_2^2)\}}{\sqrt{E\{\mathrm{s}\_1^2 + 2\,\mathrm{g}\_{12}\,\mathrm{s}\_1\,\mathrm{s}\_2 + \mathcal{g}\_{12}^2\mathrm{s}\_2^2\}E\{\mathrm{s}\_2^2 + 2\,\mathrm{g}\_{12}\,\mathrm{s}\_1\,\mathrm{s}\_2 + \mathcal{g}\_{12}^2\mathrm{s}\_1^2\}} = \frac{2\,\mathrm{g}\_{12}}{1 + \mathrm{g}\_{12}^2}$$

This result was presented in Fig. 1.21 and was used to argue that the directional separation of first-order Ambisonics by its high crosstalk term *g*<sup>12</sup> might be too weak. Higher-order Ambisonics decreases this directional crosstalk and therefore improves the representation of diffuse sound fields.

#### **4.7 Panning with Spherical Harmonics in 3D**

In three space dimensions, the spherical coordinate system has a radius *r* and two angles, azimuth ϕ indicating the polar angle of the orthogonal projection to the *xy* plane, and the zenith angle ϑ indicating the angle to the *z* axis, according to the righthanded spherical coordinate systems in ISO31-11, ISO80000-2, [30, 31], Fig. 4.11.

By the generalized chain rule, Appendix A.3 re-writes the Laplacian to spherical coordinates in 3D with *r* signifying the radius, ϕ the azimuth angle, and the zenith angle <sup>ϑ</sup> re-expressed as <sup>ζ</sup> <sup>=</sup> *<sup>z</sup> <sup>r</sup>* = cos ϑ, yielding the operator = 2 *r* ∂ <sup>∂</sup>*<sup>r</sup>* <sup>+</sup> <sup>∂</sup><sup>2</sup> <sup>∂</sup>*<sup>r</sup>* <sup>2</sup> <sup>+</sup> <sup>1</sup> *r* <sup>2</sup>(1−ζ <sup>2</sup>) ∂2 ∂ϕ<sup>2</sup> <sup>−</sup> <sup>2</sup> *<sup>r</sup>* <sup>2</sup> <sup>ζ</sup> <sup>∂</sup> ∂ζ <sup>+</sup> <sup>1</sup>−<sup>ζ</sup> <sup>2</sup> *r* 2 ∂2 ∂ζ <sup>2</sup> . Any radius-dependent part is removed to define an eigenproblem yielding the basis for panning functions, taking only *r* <sup>2</sup>ϕ,ζ,3D,

$$\left[\frac{1}{1-\xi^2}\frac{\partial^2}{\partial\varphi^2} - 2\xi\frac{\partial}{\partial\xi} + (1-\xi^2)\frac{\partial^2}{\partial\xi^2}\right]Y = -\lambda\,Y\tag{4.24}$$

whose solution with λ = *n*(*n* + 1) defines the spherical harmonics

$$Y\_n^m(\theta) = Y\_n^m(\varphi, \vartheta) = \Theta\_n^m(\vartheta) \, \Phi\_m(\varphi). \tag{4.25}$$

The pre-requisites are (i) periodicity in ϕ and (ii) that the function *Y <sup>m</sup> <sup>n</sup>* is finite on the sphere. In addition to the circular harmonics *<sup>m</sup>* expressing the dependency on azimuth ϕ according to Eq. (4.10), the spherical harmonics contain the associated Legendre functions *P<sup>m</sup> <sup>n</sup>* and their normalization term

$$\Theta\_n^m(\vartheta) = N\_n^{|m|} P\_n^{|m|}(\cos \vartheta) \tag{4.26}$$

**Fig. 4.11** The spherical coordinate system

68 4 Ambisonic Amplitude Panning and Decoding in Higher Orders

to express the dependency on the zenith angle ϑ. The index *n* ≥ 0 expresses the order and the directional resolution can be limited by requiring 0 ≤ *n* ≤ N. The index *m* is the degree and for each *n* it is limited by −*n* ≤ 0 ≤ *n*.

The spherical harmonics, Fig. 4.12, are orthonormal on the sphere −π ≤ ϕ ≤ π and 0 ≤ ϑ ≤ π, and for unbounded order N → ∞ they are complete; see also Appendix A.3.7.

The spherical harmonics permit a series representation of square-integrable 3D directional functions by the coefficients γ*nm*,

$$\log(\theta) = \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} \chi\_{nm} Y\_n^m(\theta). \tag{4.27}$$

From a known function *g*(*θ*), the coefficients are obtained by the transformation integral over the unit sphere S2, cf. appendix Eq. (A.38)

$$\gamma\_{nm} = \int\_{\mathbb{S}^2} \mathbf{g}(\theta) \, Y\_n^m(\theta) \, \mathrm{d}\theta. \tag{4.28}$$

*Note that the above N3D normalization <sup>θ</sup>*∈S<sup>2</sup> <sup>|</sup>*<sup>Y</sup> <sup>m</sup> <sup>n</sup>* (*θ*)| <sup>2</sup> d*θ* = 1 *defines each spherical harmonic except for an arbitrary-phase it might be multiplied with. Legendre functions for the zenith dependency might be defined differently in literature, and for azimuth, some implementations use* sin(*m*ϕ) *instead of* sin(|*m*|ϕ). *In Ambisonics, real-valued functions and the SN3D normalization* <sup>1</sup> 2 (*n*−|*m*|)! (*n*+|*m*|)! *are preferred, and positive signs of the first-order dipole components in the directions of the respective coordinate axes, x*, *y*, *z*, *are preferred. This might require to involve*

**Fig. 4.12** Spherical harmonics indexed by Ambisonic channel number *AC N* <sup>=</sup> *<sup>n</sup>*<sup>2</sup> <sup>+</sup> *<sup>n</sup>* <sup>+</sup> *<sup>m</sup>*; rows show spherical harmonics for the order 0 ≤ *n* ≤ 3 with the 2*n* + 1 harmonics of the degree −*n* ≤ *<sup>m</sup>* <sup>≤</sup> *<sup>n</sup>*. What is plotted is a polar diagram with the radius *<sup>R</sup>* <sup>=</sup> 20 lg <sup>|</sup>*<sup>Y</sup> <sup>m</sup> <sup>n</sup>* | normalized to the upper 30 dB of each pattern, with positive (gray) and negative (black) color indicating the sign. The order *n* counts the circular zero crossings, and |*m*| counts those running through zenith and nadir

*the Condon-Shortley phase* (−1)*<sup>m</sup> to correct the signs of the Legendre functions, or* −1 *for m* < 0 *to correct the sign of azimuthal sinusoids, depending on the implementation of the respective functions. It is often helpful to employ converters and directional checks to ensure compatibility!*

*3D panning function*. An infinitely narrow direction range around a desired direction *θ*T <sup>s</sup> *θ* > cos ε → 1 is represented by the transformation integral over the Dirac delta δ(<sup>1</sup> <sup>−</sup> *<sup>θ</sup>*<sup>T</sup> <sup>s</sup> *θ*), cf. Eq.(A.41), so that the coefficients of the panning function are

$$\gamma\_{nm} = Y\_n^m(\theta\_s). \tag{4.29}$$

As infinitely many spherical harmonics are complete, the panning function is

$$\log(\theta) = \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} Y\_n^m(\theta\_s) \, Y\_n^m(\theta) = \delta(1 - \theta\_s^\mathrm{T}\theta),\tag{4.30}$$

and in practice, the finite-resolution Nth-order panning function with *n* ≤ N employs a weight *an* to reduce side lobes and optimize the spread

$$\text{g}\_{\text{N}}(\boldsymbol{\theta}) = \sum\_{n=0}^{\text{N}} \sum\_{m=-n}^{n} a\_{n} \, Y\_{n}^{m}(\boldsymbol{\theta}\_{\text{s}}) \, Y\_{n}^{m}(\boldsymbol{\theta}). \tag{4.31}$$

The max-*r*<sup>E</sup> panning function uses the weights *an* = *Pn* cos( <sup>137</sup>.9◦ <sup>N</sup>+1.<sup>51</sup> ) , as derived in Appendix Eq. (A.46). The spread is now adjustable by the order to <sup>±</sup> <sup>137</sup>.9◦ <sup>N</sup>+1.<sup>51</sup> . Figure 4.13 shows a comparison to the basic weighting *an* = 1. An alternative expression that uses Legendre polynomials *Pn* and only depends on the angle φ to the panning direction *θ*<sup>s</sup> is obtained by replacing the sum over *m* by the *spherical harmonics addition theorem <sup>n</sup> <sup>m</sup>*=−*<sup>n</sup> <sup>Y</sup> <sup>m</sup> <sup>n</sup>* (*θ*s) *Y <sup>m</sup> <sup>n</sup>* (*θ*) <sup>=</sup> <sup>2</sup>*n*+<sup>1</sup> <sup>4</sup><sup>π</sup> *Pn*(cos φ),

$$\log\_N(\phi) = \sum\_{n=0}^N \frac{2^{n+1}}{4\pi} \,\, a\_n \, P\_n(\cos \phi). \tag{4.32}$$

Comparison to first-order Ambisonics shows: now *Y <sup>m</sup> <sup>n</sup>* (*θs*) represents the recorded or encoded directions, and *Y <sup>m</sup> <sup>n</sup>* (*θ*) represents the decoded playback directions.

*Optimal sampling of the 3D panning function*. In the theory of spherical polynomials in the variable <sup>ζ</sup> <sup>=</sup> *<sup>θ</sup>*<sup>T</sup> *<sup>s</sup> θ*, so-called *t*-designs describe point sets of given directions {**θ***l*} with *l* = 1,..., L and size L that allow to perfectly compute the integral (constant part) over the polynomials P*n*(ζ ) of limited order *n* ≤ *t* by discrete summation

$$\int\_{-\pi}^{\pi} \mathrm{d}\varphi \int\_{-1}^{1} \mathcal{P}\_n(\boldsymbol{\xi}) \, \mathrm{d}\boldsymbol{\xi} = \sum\_{l=1}^{L} \mathcal{P}\_n(\boldsymbol{\theta}\_s^{\mathrm{T}} \boldsymbol{\theta}\_l) \, \frac{4\pi}{\mathcal{L}},\tag{4.33}$$

**Fig. 4.13** 3D unweighted *an* = 1 basic and weighted max-*r*<sup>E</sup> Ambisonic panning functions for the orders N = 1, 2, 5

relative to any axis *θ*<sup>s</sup> the point set is projected onto. In 3D, the Legendre polynomials *Pn*(ζ ) are orthogonal polynomials, therefore an Nth-order panning function composed thereof is a polynomial of Nth order. The loudness measure *E* is calculated by the integral over *g*<sup>2</sup> N, therefore over a polynomial of the order 2N. The integral to calculate *r*<sup>E</sup> runs over *g*<sup>2</sup> <sup>N</sup> ζ , therefore over a polynomial of the order 2N + 1. In playback, to get a perfectly panning-invariant loudness measure *E* of the continuous panning function and also the perfectly oriented *r*<sup>E</sup> vector of constant spread arccos *r*E, the parameter *t* must be *t* ≥ 2N + 1. In 3D there are only 5 geometrically regular layouts


For instance, for N = 1, the octahedron is a suitable spherical design, for N = 2, the icosahedral or dodecahedral layouts are suitable.

Exceeding the geometrically regular layouts, there are designs found by optimization to be regular under the mathematical rule to approximate <sup>S</sup><sup>2</sup> *Y <sup>m</sup> <sup>n</sup>* (*θ*) <sup>d</sup>*<sup>θ</sup>* <sup>=</sup> <sup>√</sup>4πδ*<sup>n</sup>* accurately by <sup>4</sup><sup>π</sup> L *<sup>l</sup> Y <sup>m</sup> <sup>n</sup>* (**θ***l*)for all *n* ≤ *t* and |*m*| ≤ *n*. A large collection can be found by Hardin and Sloane [32], Gräf and Potts [33], and Womersley [34] available on the following websites

http://neilsloane.com/sphdesigns/dim3/

http://homepage.univie.ac.at/manuel.graef/quadrature.php

(Chebyshev-type Quadratures on S2), and

https://web.maths.unsw.edu.au/~rsw/Sphere/EffSphDes/ss.html.

Figure 4.14 gives some graphical examples.

**Fig. 4.14** *t*-designs from Gräf's website (*Chebyshev-type quadrature*)

#### **4.8 Ambisonic Encoding and Optimal Decoding in 3D**

To encode a signal *s* into Ambisonic signals χ*nm*, we multiply the signal with the encoder representing the direction *θ*<sup>s</sup> of the signal by the weights *Y <sup>m</sup> <sup>n</sup>* (*θ*s)

$$\chi\_{nm}(t) = Y\_n^m(\theta\_s) \,\mathrm{s}(t),\tag{4.34}$$

or in vector notation

$$\mathbf{x}\_{\rm N} = \mathbf{y}\_{\rm N}(\theta\_s) \,\mathrm{s},\tag{4.35}$$

using the column vector *y*<sup>N</sup> = [*Y* <sup>0</sup> <sup>0</sup> (*θ*s), *Y* <sup>−</sup><sup>1</sup> <sup>1</sup> (*θ*s), . . . , *Y* <sup>N</sup> <sup>N</sup> (*θ*s)] <sup>T</sup> of (N + 1)<sup>2</sup> components. The Ambisonic signals in *χ* <sup>N</sup> are weighted by side-lobe suppressing weights *a*<sup>N</sup> = [*a*0, *a*1, *a*1, *a*1, *a*2,..., *a*N] T, expressed by the multiplication with a diagonal matrix diag{*a*N}, and then decoded to the L loudspeaker signals *x* by a sampling decoder

$$D = \sqrt{\frac{4\pi}{\mathcal{L}}} \left[ \mathbf{y}\_{\text{N}}(\boldsymbol{\theta}\_{\text{l}}), \dots, \ y\_{\text{N}}(\boldsymbol{\theta}\_{\text{L}}) \right]^{\text{T}} = \sqrt{\frac{4\pi}{\mathcal{L}}} \,\,\mathbf{Y}\_{\text{N}}^{\text{T}},\tag{4.36}$$

using

$$\mathbf{x} = \mathbf{D}\operatorname{diag}\{\mathbf{a}\_{\mathrm{N}}\}\,\mathbf{x}\_{\mathrm{N}}.\tag{4.37}$$

In total, the system for encoding Eq. (4.35) and decoding Eq. (4.36) can also be written to yield loudspeaker gains for one signal

$$\mathbf{g} = D \operatorname{diag} \{ \mathbf{a}\_{\mathcal{N}} \} \,\, \mathbf{y}\_{\mathcal{N}}(\theta\_{\mathbf{s}}),\tag{4.38}$$

or in particular for the 3D sampling decoding *g* = <sup>4</sup><sup>π</sup> <sup>L</sup> *<sup>Y</sup>*<sup>T</sup> <sup>N</sup> diag{*a*N} *y*N(*θ*s).

#### **4.9 Ambisonic Decoding to Loudspeakers**

Ambisonic decoding to loudspeakers has been dealt with by numerous researchers, in the past, particularly because result are not very stable for first-order Ambisonics, and later because they strongly depend on how uniform the loudspeaker layout is for higher-order Ambisonics. Moreover, Solvang found that even the use of too many loudspeakers has a degrading effect [35].

For first-order decoding, the Vienna decoders by Michael Gerzon [36] are often cited, and for higher-order Ambisonic decoding, one can, e.g. find works by Daniel with max-*r*<sup>E</sup> [37] and pseudo-inverse decoding [10], also by Poletti [14, 38, 39].

What turned out to be the most practical solution, is the All-Round Ambisonic Decoding approach (AllRAD) due to its feature of allowing imaginary loudspeaker insertion and downmix as described in the sections above, cf. [40]. It moreover does not have restrictions on the Ambisonics order, which for other decoders often yields poor controllability of panning-dependent fluctuations in loudness and directional mapping errors.

The playable set of directions **θ***<sup>l</sup>* or ϕ*<sup>l</sup>* is usually finite and discrete, and it is represented by the surrounding loudspeakers' directions. The directional distribution of the surrounding loudspeakers is typically neither a *t*-design (with *t* ≥ 2N + 1 in general, sometimes not even regular polygons with L ≥ 2N + 2 loudspeakers for 2D, in particular). In such cases, it is extremely helpful to be aware of the properties of the various decoder design methods.

#### *4.9.1 Sampling Ambisonic Decoder (SAD)*

The sampling decoder as introduced above is the simplest decoding method. For dimensions (D = 2) and three (D = 3), it uses the matrix *Y* <sup>N</sup> = [ *y*N(**θ**1), . . . , *y*N(**θ**L)] containing the respective circular or spherical harmonics *y*N(*θ*) sampled at the loudspeaker directions {**θ***l*},

$$\mathcal{D} = \sqrt{\frac{\mathcal{S}\_{\rm D-l}}{\mathcal{L}}} \, \mathbf{Y}\_{\rm N}^{\rm T},\tag{4.39}$$

with the circumference of the unit circle denoted as *S*<sup>1</sup> = 2π or the surface of the unit sphere written as *<sup>S</sup>*<sup>2</sup> <sup>=</sup> <sup>4</sup>π. The factor *<sup>S</sup>*D−<sup>1</sup> <sup>L</sup> expresses that each loudspeaker synthesizes a fraction of the *E* measure on the circle or sphere of the surrounding directions. However, the sampling decoder would neither yield perfectly constant loudness and width measures, *E*, *r*E, nor a correct aiming of the localization measure *r*<sup>E</sup> if the loudspeaker layout wasn't optimal. For instance concerning loudness, for panning towards directional regions of poor loudspeaker coverage, sampling misses out the main lobe of the panning function, yielding a noticeably reduced loudness.

#### *4.9.2 Mode Matching Decoder (MAD)*

The mode-matching method is used in [10, 39] and yields a fundamentally different decoder design. Its concept is to re-encode the gain vector *g* of the loudspeakers for any panning direction *θ*<sup>s</sup> by the encoding matrix *Y* <sup>N</sup> = [ *y*N(**θ**1), . . . , *y*N(**θ**L)] for all loudspeaker directions {**θ***l*}. Ideally, the re-encoded result should match the encoding of the panning direction with sidelobes suppressed

$$\mathbf{y}\_{\mathcal{N}} \mathbf{g} = \text{diag}\{\mathbf{a}\_{\mathcal{N}}\} \mathbf{y}\_{\mathcal{N}}(\boldsymbol{\theta}\_{\boldsymbol{s}}) .$$

Using the definition *g* = *D* diag{*a*N} *y*N(*θs*) of the panning gains, we obtain

$$\begin{aligned} \mathbf{Y\_N} \mathbf{D} \operatorname{diag} \{ \mathbf{a\_N} \} \mathbf{y\_N}(\theta\_s) &= \operatorname{diag} \{ \mathbf{a\_N} \} \mathbf{y\_N}(\theta\_s), \\ \Rightarrow \mathbf{D} &= \sqrt{\frac{\mathbf{L}}{S\_{\mathbb{D}-1}}} \mathbf{Y\_N^T} (\mathbf{Y\_N} \mathbf{Y\_N^T})^{-1} \end{aligned} \tag{4.40}$$

so that the decoder *D* is required to be right-inverse to the matrix *Y* N, i.e. *Y* <sup>N</sup> *D* = *Y* <sup>N</sup>*Y*<sup>T</sup> <sup>N</sup>(*Y* <sup>N</sup>*Y*<sup>T</sup> <sup>N</sup>)−<sup>1</sup> <sup>=</sup> *<sup>I</sup>*, see Eq. (A.63) in Appendix A.4. For the inverse of *<sup>Y</sup>* <sup>N</sup>*Y*<sup>T</sup> <sup>N</sup> to exist, it is necessary to have at least as many loudspeakers as harmonics, i.e. L ≥ (N + 1)<sup>2</sup> with D = 3 or L ≥ 2N + 1 for D = 2. However, this is not a sufficient criterion yet: In directions poorly covered with loudspeakers, the inversion will boost the loudness, so that the result is often numerically ill conditioned for (*Y* <sup>N</sup>*Y*<sup>T</sup> <sup>N</sup>)−<sup>1</sup> unless the loudspeaker layout is uniformly designed, at least. Mode matching decoding is ill-conditioned on hemispherical or semicircular loudspeaker layouts. The solution is equivalently described by the more general pseudo inverse *Y*† N, which is right-inverse for fat matrices.

#### *4.9.3 Energy Preservation on Optimal Layouts*

For instance, for an order of N = 2, 2D Ambisonics should work optimally with a ring of 45◦ spaced loudspeakers on the horizon, a circular (2N + 1)-design, or for 3D, a spherical (2N + 1)-design. On a *t*-design selected by *t* ≥ 2N, the loudness measure *E* is panning-invariant, in general,

$$E = \left\|\mathbf{g}\right\|^2 = \mathbf{y}\_{\mathcal{N}}^{\mathrm{T}}(\boldsymbol{\theta}\_{\mathcal{S}}) \operatorname{diag} \{ \mathbf{a}\_{\mathcal{N}} \} \underbrace{\mathbf{D}^{\mathrm{T}} \mathbf{D}}\_{=\mathbf{I}} \operatorname{diag} \{ \mathbf{a}\_{\mathcal{N}} \} \left\| \mathbf{y}\_{\mathcal{N}}(\boldsymbol{\theta}\_{\mathcal{S}}) = \left\| \operatorname{diag} \{ \mathbf{a}\_{\mathcal{N}} \} \mathbf{y}\_{\mathcal{N}}(\boldsymbol{\theta}\_{\mathcal{S}}) \right\|^2 = \text{const.} $$

This is because a *t* ≥ 2N-design discretization preserves orthonormality

$$\int \mathbf{y}\_{\mathrm{N}}(\theta) \, \mathbf{y}\_{\mathrm{N}}^{\mathrm{T}}(\theta) \, \mathrm{d}\theta = \frac{s\_{\mathrm{D}-l}}{\mathrm{L}} \sum\_{l=1}^{\mathrm{L}} \mathbf{y}\_{\mathrm{N}}(\theta\_{l}) \, \mathbf{y}\_{\mathrm{N}}^{\mathrm{T}}(\theta\_{l}) = \frac{s\_{\mathrm{D}-l}}{\mathrm{L}} Y\_{\mathrm{N}} Y\_{\mathrm{N}}^{\mathrm{T}} = I,\qquad(4.41)$$

**Fig. 4.15** Analysis of loudness, localization error, and width for 3rd-order sampling (SAD) and mode-matching (MAD) Ambisonic decoding for all panning angles on a sub-optimal 45◦ spaced loudspeaker ring with gap at −90◦, compared to VBIP

which implies for the sampling decoder *<sup>D</sup>*<sup>T</sup> *<sup>D</sup>* <sup>=</sup> *<sup>S</sup>*D−<sup>1</sup> <sup>L</sup> *<sup>Y</sup>* <sup>N</sup>*Y*<sup>T</sup> <sup>N</sup> = *I*, and we notice the panning invariant norm of *g*(*θ*) within its coefficients *γ* <sup>N</sup> = diag{*a*N} *y*N(*θ*s) by the Parseval theorem *g*<sup>2</sup>(*θ*) d*θ* = *γ* <sup>N</sup>2. The panning invariant *E* measure also holds for the mode-matching decoder using a *t* ≥ 2N-design, as it becomes equivalent to a sampling decoder *D* = L *S*D−<sup>1</sup> *Y*T <sup>N</sup>(*Y* <sup>N</sup>*Y*<sup>T</sup> <sup>N</sup>)−<sup>1</sup> = L *S*D−<sup>1</sup> *Y*T N *S*D−<sup>1</sup> <sup>L</sup> = *<sup>S</sup>*D−<sup>1</sup> <sup>L</sup> *<sup>Y</sup>*<sup>T</sup> N. Under these ideal conditions, both decoders are energy-preserving.

#### *4.9.4 Loudness Deficiencies on Sub-optimal Layouts*

For 2D layouts, Fig. 4.15 shows what happens if a decoder is calculated for a *t* ≥ 2N + 1-design with one loudspeaker removed: While, for panning across the gap, the sampling Ambisonic decoder (SAD) yields a quieter signal, moderate localization errors and width fluctuation, the mode-matching decoder (MAD) yields a strong loudness increase and severe jumps in the localization/width. MAD is therefore not very practical with sub-optimal layouts, SAD only slightly more so.

#### *4.9.5 Energy-Preserving Ambisonic Decoder (EPAD)*

To establish panning-invariant loudness for decoding to non-uniform surround loudspeaker layouts one can ensure a constant loudness measure *E* by enforcing *<sup>D</sup>*<sup>T</sup> *<sup>D</sup>* <sup>=</sup> *<sup>I</sup>*, which is otherwise only achieved on *<sup>t</sup>* <sup>≥</sup> 2N-designs. We may search for a decoding matrix *D* whose entries are closest to the sampling decoder under the constraint to be column-orthogonal:

$$\|\mathbf{D} - \sqrt{\frac{\mathbf{s}\_{\mathrm{D}-1}}{\mathbf{L}}} \mathbf{Y}\_{\mathrm{N}}^{\mathrm{T}}\|\_{\mathrm{Fro}}^{2} \to \min \tag{4.42}$$
 
$$\text{subject to } \mathbf{D}^{\mathrm{T}} \mathbf{D} = I.$$

The singular value decomposition of

$$\mathbf{Y}\_{\mathbf{N}}^{\mathrm{T}} = \mathbf{U} \text{ [diag\{\mathbf{s}\}, \mathbf{0}]^{\mathrm{T}} \mathbf{V}^{\mathrm{T}} \tag{4.43}$$

can be used to create

$$D = U \begin{bmatrix} I, \mathbf{0} \end{bmatrix}^{\mathrm{T}} \,\mathrm{V}^{\mathrm{T}}.\tag{4.44}$$

by replacing the singular values *s* with ones. Such a decoder is column-orthogonal, as the singular-value decomposition delivers *<sup>U</sup>*T*<sup>U</sup>* <sup>=</sup> *<sup>I</sup>* and *V V*<sup>T</sup> <sup>=</sup> *<sup>I</sup>*, and as a consequence<sup>1</sup> *<sup>D</sup>*<sup>T</sup> *<sup>D</sup>* <sup>=</sup> *<sup>I</sup>*. The energy-preserving decoder in this basic version requires L ≥ 2N + 1 loudspeakers in 2D or L ≥ (N + 1)<sup>2</sup> in 3D to work.

Note that if the loudspeaker setup directions are already a *t* ≥ 2N design, the sampling, mode-matching, and energy-preserving decoders are equivalent.

#### *4.9.6 All-Round Ambisonic Decoding (AllRAD)*

In Chap. 3 on vector-base amplitude panning methods, a well-balanced panning result in terms of loudness, width, and localization was achieved by MDAP that distributes a signal to an arrangement of several superimposed VBAP virtual sources. Hereby *E* = const., *r*<sup>E</sup> ≈ *r*<sup>E</sup> *θ*s, and *r*<sup>E</sup> ≈ const. This works for nearly any loudspeaker layout.

While, to calculate loudspeaker gains, MDAP superimposes an arrangement of discrete virtual sources within a range of ±α around the panning direction *θ*s, one could also think of superimposing a quasi-continuous distribution of virtual sources that are weighted by a continuous panning function *g*(*θ*).

The ideal continuous panning function *g*(*θ*) of axisymmetric directional spread around the panning direction *θ*<sup>s</sup> is described by *g*(*θ*) = *y*<sup>T</sup> <sup>N</sup>(*θ*) diag{*a*N} *y*N(*θ*s), the Ambisonic panning function. This rotation-invariant continuous function is optimal in terms of loudness, width, and localization measures, which are all evaluated by continuous integrals: *E* = *g*<sup>2</sup>(*θ*) d*θ* = const. expresses panning-invariant loudness, *<sup>r</sup>*<sup>E</sup> <sup>=</sup> <sup>1</sup> *E g*<sup>2</sup>(*θ*) *θ* d*θ* = *r*<sup>E</sup> *θ*<sup>s</sup> indicates a perfect alignment *r*<sup>E</sup> *θ*<sup>s</sup> with the panning direction and a panning-invariant width *r*<sup>E</sup> = const. However, the optimal values of

<sup>1</sup>In detail, this follows from *<sup>D</sup>*<sup>T</sup> *<sup>D</sup>* <sup>=</sup> *<sup>V</sup>*[*I*, **<sup>0</sup>**]✟*U*<sup>T</sup>✟*U*[*I*, **<sup>0</sup>**] <sup>T</sup>*V*<sup>T</sup> <sup>=</sup> *<sup>V</sup>*✭✭✭✭✭ [*I*, **<sup>0</sup>**][*I*, **<sup>0</sup>**] <sup>T</sup>*V*<sup>T</sup> <sup>=</sup> *V V*<sup>T</sup> <sup>=</sup> *<sup>I</sup>*.

these integrals are only preserved by discretization with optimal *t* ≥ 2N + 1-design loudspeaker layouts.

*All-round Ambisonic decoding (AllRAD)* is preceded by the work of Batke and Keiler [16]. They describe Ambisonic panning *g*AllRAD(*θ*) = *D y*N(*θ*) by a decoder *D*, whose result matches best with VBAP *g*VBAP(*θ*). Without max-*r*<sup>E</sup> weights yet, we use this here to define AllRAD by the integral expressing a minimum-meansquare-error problem using the integral over all panning directions *θ*

$$\min\_{\mathbf{D}} \int\_{\mathbb{S}^2} \left\| \mathbf{g}\_{\text{VBAP}}(\theta) - \mathbf{D} \left. \mathbf{y}\_{\text{N}}(\theta) \right\|^2 \,\mathrm{d}\theta \,. \tag{4.45}$$

Equivalently, as described by Zotter and Frank [40] who coined the name, we may define AllRAD as VBAP synthesis on the physical loudspeakers when using as multiple-virtual-source inputs the Ambisonic panning function *g*AMBI(*θ*) = *y*<sup>T</sup> <sup>N</sup>(*θ*) diag{*a*N} *y*N(*θ*s)sampled at an optimal layout of virtual loudspeakers. Here, we write the synthesis as the integral over infinitely many virtual loudspeakers *θ*,

$$\begin{split} \mathbf{g} &= \int \mathbf{g}\_{\text{VBAP}}(\boldsymbol{\theta}) \, \mathbf{g}\_{\text{AMBI}}(\boldsymbol{\theta}) \, \mathrm{d}\boldsymbol{\theta} = \int \mathbf{g}\_{\text{VBAP}}(\boldsymbol{\theta}) \, \mathbf{y}\_{\text{N}}^{\text{T}}(\boldsymbol{\theta}) \, \text{diag}\{\mathbf{a}\_{\text{N}}\} \mathbf{y}\_{\text{N}}(\boldsymbol{\theta}\_{\text{s}}) \, \mathrm{d}\boldsymbol{\theta} \\ &= \underbrace{\int \mathbf{g}\_{\text{VBAP}}(\boldsymbol{\theta}) \, \mathbf{y}\_{\text{N}}^{\text{T}}(\boldsymbol{\theta}) \, \mathrm{d}\boldsymbol{\theta}}\_{:=\boldsymbol{D}} \, \mathrm{diag}\{\mathbf{a}\_{\text{N}}\} \mathbf{y}\_{\text{N}}(\boldsymbol{\theta}\_{\text{s}}) . \end{split}$$

We can obviously pull the term diag{*a*N} *y*N(*θ*s) out of the integral. The remaining integral defines the AllRAD matrix *D*. We may interpret it as a transformation of the VBAP loudspeaker gain functions *g*VBAP(*θ*) into spherical harmonic coefficients. In the original paper [40], AllRAD is evaluated by an optimal layout of discrete virtual loudspeakers

$$\mathbf{D} = \int \mathbf{g}\_{\text{VBAP}}(\theta) \, \mathbf{y}\_{\text{N}}^{\text{T}}(\theta) \, \mathbf{d}\theta = \frac{s\_{\text{D}-l}}{\text{L}} \sum\_{l=0}^{\text{L}} \mathbf{g}\_{\text{VBAP}}(\hat{\theta}\_{l}) \, \mathbf{y}\_{\text{N}}^{\text{T}}(\hat{\theta}\_{l}) = \frac{s\_{\text{D}-l}}{\text{L}} \, \hat{\mathbf{G}} \, \hat{\mathbf{Y}}\_{\text{N}}^{\text{T}}, \quad (4.46)$$

using the directions {**θˆ***l*} of a *t*-design. As VBAP's gain functions aren't smooth (derivatives are non-continuous), they are order-unlimited, and a *t*-design of sufficiently high *t* should be used. In the 3D practice, the 5200 pts. Chebyshev-type design from [33] is dense enough. Note that the VBAP part permits improvements by insertion and downmix of imaginary loudspeakers to adapt to asymmetric or hemispherical layouts, as suggested in the original paper [40], cf. Sect. 3.3.

*Note that the decoder needs to be scaled properly. For instance, the norm of the omnidirectional component (first column) could be equalized to one, as it would typically be with a sampling decoder; there are alternative strategies to circumvent the scaling problem* [41].

*4.9.7 EPAD and AllRAD on Sub-optimal Layouts*

Figude 4.16 shows the improvement achieved with EPAD and AllRAD on an equiangular arrangement that is suboptimal by the missing loudspeaker at −90◦. Both decoders manage to handle either the loudness stabilization perfectly well (EPAD) or keep the directional and spread mapping errors small (AllRAD). We notice that for EPAD, with the constraint that L ≥ (2N + 1) just fulfilled for N = 3 and L = 7 of the simulation, it would not simply be possible to remove any further loudspeakers without degradation.

#### *4.9.8 Decoding to Hemispherical 3D Loudspeaker Layouts*

In typical loudspeaker playback situations for large audience, a solid floor and no loudspeakers below ear level are considered practical for several reasons. However, this does not permit decoding by sampling with optimal *t*-design layouts covering all directions. As shown above, EPAD and AllRAD do not require such arrays. And yet, they still require some care when used with hemispherical loudspeaker layouts, see [15, 40] for further reading.

*EPAD with hemispherical loudspeaker layouts*. Even for a hemispherical layout, the energy-preserving decoding method requires L ≥ (N + 1)<sup>2</sup> loudspeakers to achieve a perfectly panning-invariant loudness. However, this is counter-intuitive: *Why should one need at least as many loudspeakers on a hemisphere as are required for sameorder playback on a full sphere? Shouldn't the number be half as many?*

**Table 4.1** Integration ranges 0 ≤ ϑ ≤ ϑmax to obtain (N + 1)(N + 2)/2 Slepian functions with minimum loudness fluctuation max*<sup>E</sup>* min*<sup>E</sup>* for panning on the hemisphere


We can show that while the spherical harmonics are orthonormal on the sphere S2, i.e. <sup>S</sup><sup>2</sup> *y*N(*θ*) *y*<sup>T</sup> <sup>N</sup>(*θ*) <sup>d</sup>*<sup>θ</sup>* <sup>=</sup> *<sup>I</sup>*, they aren't orthogonal on the hemisphere *<sup>S</sup>* <sup>=</sup> <sup>S</sup><sup>2</sup> : ϑ ≤ ϑmax

$$\int\_{\mathcal{S}} \mathbf{y}\_{\mathcal{N}}(\boldsymbol{\theta}) \mathbf{y}\_{\mathcal{N}}^{\mathrm{T}}(\boldsymbol{\theta}) \, \mathrm{d}\boldsymbol{\theta} = \mathbf{G}.\tag{4.47}$$

Here, *G* is called Gram matrix, and it is evaluated by <sup>4</sup><sup>π</sup> Lˆ *<sup>l</sup>*:θz,*l*≥<sup>0</sup> *<sup>y</sup>*N(**θ***l*) *<sup>y</sup>*<sup>T</sup> <sup>N</sup>(**θ***l*) using a high-enough *t*-design. By singular-value decomposition of the positive semi-definite matrix *<sup>G</sup>* <sup>=</sup> *<sup>Q</sup>* diag{*s*} *<sup>Q</sup>*T, with *<sup>Q</sup>*<sup>T</sup> *<sup>Q</sup>* <sup>=</sup> *Q Q*<sup>T</sup> <sup>=</sup> *<sup>I</sup>*, we diagonalize *<sup>G</sup>* and find new basis functions *y***˜**N(*θ*), the so-called *Slepian* functions [42], that are orthogonal on *S*

$$\mathcal{Q}^{\mathrm{T}}\mathbf{G}\,\mathcal{Q} = \mathrm{diag}\{\mathbf{s}\} = \int\_{\mathcal{S}} \mathcal{Q}^{\mathrm{T}}\mathbf{y}\_{\mathrm{N}}(\boldsymbol{\theta})\mathbf{y}\_{\mathrm{N}}^{\mathrm{T}}(\boldsymbol{\theta})\,\mathcal{Q}\,\mathrm{d}\boldsymbol{\theta}, \quad \Rightarrow \,\tilde{\mathbf{y}}\_{\mathrm{N}}(\boldsymbol{\theta}) = \mathcal{Q}^{\mathrm{T}}\mathbf{y}\_{\mathrm{N}}(\boldsymbol{\theta}).$$

Typically, the singular values in *s* are sorted descendingly *s*<sup>1</sup> ≥ *s*<sup>2</sup> ≥···≥ *s*(N+1)<sup>2</sup> so that it is possible to cut out basis functions of significantly large contribution to the upper hemisphere *S* by

$$
\tilde{\mathbf{y}}\_{\mathcal{N}}(\boldsymbol{\theta}) = [I, \; \mathbf{0}] \; \mathcal{Q}^{\mathrm{T}} \mathbf{y}\_{\mathcal{N}}(\boldsymbol{\theta}).\tag{4.48}
$$

Typically, the numerical integral is extended to slightly below the horizon, see Table 4.1, so that truncation to the (N + 1)(N + 2)/2 most significant basis functions, see Fig. 4.17, produces a minimum fluctuation in the loudness measure *E*˜ = *y***˜**N(*θ*)<sup>2</sup> for panning on the hemisphere.

With *y***˜**N(*θ*), EPAD is calculated in the same way as for the ordinary harmonics

$$
\tilde{Y}\_N^T = \tilde{U} \text{[diag\{s\}, \ 0]}^T \tilde{V}^T, \qquad \qquad \tilde{D} = \tilde{U} \begin{bmatrix} I \ \mathbf{0} \end{bmatrix}^T \tilde{V}^T, \tag{4.49}
$$

with the main difference that the lower limit for the number of loudspeakers decreases to L <sup>≥</sup> (<sup>N</sup> <sup>+</sup> <sup>1</sup>)(<sup>N</sup> <sup>+</sup> <sup>2</sup>)/2. Interfaced to the spherical harmonics by [*I*, **<sup>0</sup>**] *<sup>Q</sup>*T, the hemispherical energy-preserving decoder becomes

$$D = \tilde{D} \begin{bmatrix} I, \ \mathbf{0} \end{bmatrix} \mathbf{Q}^{\mathrm{T}}.\tag{4.50}$$

**Fig. 4.17** Slepian basis functions for the upper hemisphere, composed of the spherical harmonics up to 3rd order, using 0◦ ≤ ϑ ≤ 113◦ as integration interval and (N + 1)(N + 2)/2 functions. To get these nice shapes, the Slepian functions were found separately for every degree *m* where they were moreover de-mixed by *Q R* decomposition

*AllRAD with hemispherical loudspeaker layouts*. Because of the vector-base amplitude panning involved, all-round Ambisonic decoding (AllRAD) is comparatively robust to irregular loudspeaker setups. Still, a hemispherical layout does not contain any loudspeaker direction vector pointing to the lower half space, therefore one could just omit information of the lower half space. However, the Ambisonic panning function implies a directional spread, so that panning to exactly the horizon also produces content below, whose omission causes: (i) a loss in loudness, (ii) a slight elevation of the perceived direction, cf. Fig. 4.18.

As discussed in the section on triangulation Sect. 3.3, the insertion of imaginary loudspeakers fixes this behavior. In the case of hemispherical loudspeaker layouts, it is not necessary to downmix the signal of the imaginary loudspeaker at nadir to stabilize both loudness and localization for panning to the horizon.

Signal contributions below but close to the horizon largely contribute to the horizontal loudspeakers, and it is therefore safe to dispose the signal that would feed the imaginary loudspeaker at nadir without loss of loudness. Moreover, this contribution from below also reinforces signals on the horizontal loudspeakers so that localization is pulled back down. Both can be observed in Fig. 4.18 that shows the loudness measure *E* as well as mislocalization and width by the measure *r*<sup>E</sup> using max-*r*E-weighted AllRAD with 5th-order Ambisonics along a vertical panning circle on the IEM mobile Ambisonics Array (mAmbA). It consists of 25 loudspeakers set up in rings of 8, 8, 4, 4, and 1 loudspeakers at 0, 20, 40, 60, 90 degrees elevation. Rings two and four start at 0 degree, the others are half-way rotated.

*Performance comparison on hemispherical layouts*. Figure 4.18 shows a comparison of AllRAD and EPAD decoding to the 25-channel mAmbA hemispherical loudspeaker layout.

**Fig. 4.18** Perceptual measures for 5th max-*r*E-weighted AllRAD on the IEM mAmbA layout [43] with (black) and without insertion of the bottom imaginary loudspeaker (black dotted) whose signal is disposed, and max-*r*E-weighted EPAD (gray), for panning on a vertical circle: *E* in dB (top), orientation error of *r*<sup>E</sup> in degrees (middle), and width expressed by arccos *r*E in degrees (bottom). The thin dashed line shows AllRAD without imaginary loudspeakers

While (top in Fig. 4.18) AllRAD produces a loudness fluctuation roughly spanning 1 dB for panning on the hemisphere, EPAD only exhibits 0.3 dB, as specified in Table 4.1. While in monophonic playback of noise, loudness differences of less than 0.5 dB can be heard, it is safe to assume that a weak directional loudness fluctuation of less than 1 dB is normally inaudible. In this regard, loudness fluctuation should be no problem with both EPAD and AllRAD.

Concerning the directional mapping, EPAD produces a more strongly pronounced ripple, with *r*<sup>E</sup> indicating sounds on the horizon ϑ<sup>s</sup> = ±90◦ to be pulled upwards towards 0◦ more with EPAD (7◦) than with AllRAD (3◦). In terms of width, both EPAD and AllRAD exhibit the ≈20◦ average associated with max-*r*<sup>E</sup> weighting. However, EPAD also produces a greater fluctuation, and it widens up to about 30◦ degree for panning to the horizon ϑ<sup>s</sup> = ±90◦.

With the 9 loudspeakers of the ITU [44] 4 + 5 + 0 layout (horizontal ring: ϕ = 0, ±30◦, ±120◦, upper ring at 40◦ elevation with ϕ = ±30◦, ±120◦), it is not possible anymore to use EPAD with 5th order, which would be the optimal resolution for the front loudspeaker triplet. EPAD only supports orders up to N = 2, and to lose level towards below-horizon directions, we can use the reduced set of 6 Slepian functions; alternatively all 9 spherical harmonics of N = 2 would also be thinkable. For AllRAD, imaginary loudspeakers are inserted at the sides at azimuth/elevation ±75◦/27◦, up at 0◦/78◦, back 180◦/35◦, and below 0◦/ − 90◦. It is reasonable to downmix the imaginary loudspeakers with a factor one for up, sides, back, and re-normalize the VBAP gain matrix, while disposing the signal of the imaginary loudspeaker below. AllRAD permits to use the order N = 5, which resolves the frontal loudspeaker triplet much better for horizontal panning.

Figure 4.19 shows the result of max-*r*E-weighted 2nd-order EPAD and 5th-order AllRAD for the 4 + 5 + 0 layout using a vertical panning curve. While the perfectly constant loudness measure of EPAD might be favored over the almost+3 dB loudness

**Fig. 4.19** Perceptual measures for Ambisonic panning on the ITU [44] 4 + 5 + 0 layout with insertion and downmix of imaginary loudspeakers at the sides, back, and top, and insertion and disposal of an imaginary loudspeaker below for AllRAD. Measures are evaluated for max-*r*Eweighted 5th-order AllRAD (black) and 2nd-order EPAD (gray), for panning on a vertical circle: *E* in dB (top), orientation error of *r*<sup>E</sup> in degrees (middle), and width expressed by arccos *r*E in degrees (bottom)

increase of front and back for AllRAD, AllRAD's lower directional error, narrower width mapping, greater flexibility, and simplicity has often proven to be clearly superior in practice.

#### **4.10 Practical Studio/Sound Reinforcement Application Examples**

This section analyzes the application of 3D Ambisonic amplitude panning consisting of encoding and AllRAD to studio (with typical setups of 2 m radius) and sound reinforcement applications (for an audience of, e.g., 250 people). Application scenarios are sketched in [43], and various other examples are given below. Requirements of a constant loudness and width are analyzed below, and as sound reinforcement requires a particularly large sweet area, the *r*<sup>E</sup> vector model for off-center listening positions from Sect. 2.2.9 is used to depict the sweet area size.

The analysis of decoders above described loudness measures for panning on a circle. To observe them with panning across all directions in Figs. 4.20 and 4.22, world-map-like mappings using a gray-scale representation of the loudness and width measures are more reasonable. For several loudspeaker layouts, its axes are azimuth horizontally and zenith vertically, and the gray-scale map displays the loudness measure *E* in dB (left column) and the width measure arccos *r*E in degrees (right column). As 5th-order max-*r*E-weighted AllRAD typically produces minor directional mapping errors, they aren't explicitly shown in Figs. 4.20 and 4.22. However, the mappings of the sweet area size of plausible localization in Figs. 4.21 and 4.23 illustrate the usefulness of the systems for the listening areas hosting the number of listeners targeted for either the studio or the sound reinforcement application.

Figure 4.20 illustrates AllRAD's tendency of attenuated signals in too closely spaced loudspeaker ensembles as in the front section of the ITU [44] 4 + 5 + 0. By

(b) IEM Production Studio (4 listeners)

**Fig. 4.20** Comparison of 5th-order max-*r*E-weighted AllRAD for panning across all directions on hemispherical loudspeaker layouts in studios. The left column the loudness measure *E* in dB and the right-most column the width measure arccos *r*E in degree, and the loudspeaker position are marked with a white + sign

**Fig. 4.21** Comparison of the calculated sweet are size for 5th-order max-*r*E-weighted AllRAD for panning across all directions on hemispherical loudspeaker layouts in studios. As a plausibility definition, the directional mapping errors depending on the listening position should stay within angular bounds (e.g. 10◦)

contrast, for instance the mAmbA layout in Fig. 4.22 only has 8 loudspeakers on the horizon, and signals panned to the largely spaced below-horizon triangles tend to get louder. Moreover, it is easier for loudspeaker systems of many channels such as IEM CUBE, mAmbA, Lobby, and Ligeti Hall in Fig. 4.22 to yield smooth loudness and width mappings. Still, also with only a few loudspeakers, slight direction adjustment in the layout can fix some of the behavior, as with the IEM Production Studio, whose ±45◦ loudspeakers in the elevated layer is superior to a ±30◦ spacing.

(d) KUG Ligeti Hall (250 listeners)

**Fig. 4.22** Comparison of 5th-order max-*r*E-weighted AllRAD for panning across all directions on various hemispherical loudspeaker layouts for sound reinforcement. The left column the loudness measure *E* in dB and the right-most column the width measure arccos *r*E in degree, and the loudspeaker position are marked with a white + sign

A hint for designing good decoders sometimes is idealization: often it is better to disregard the true loudspeaker setup locations and feed the decoder design with idealized positions instead. Hereby can one trade slight directional distortions for a more uniform loudness distribution. For instance at the IEM CUBE, loudspeaker

**Fig. 4.23** Comparison of the calculated sweet are size for 5th-order max-*r*E-weighted AllRAD for panning across all directions on various hemispherical loudspeaker layouts for sound reinforcement. As a plausibility definition, the directional mapping errors depending on the listening position should stay within angular bounds (e.g. 10◦)

locations of the horizontal ring could be idealized to 30◦ to get a smoother loudness mapping as the one shown in Fig. 4.22.

#### **4.11 Ambisonic Decoding to Headphones**

Typically, Ambisonic decoding to headphones can be done similarly as with loudspeakers, except that the loudspeaker signals are rendered to headphones by convolution with the head-related impulse responses (HRIRs) of the corresponding playback directions. Various databases of such HRIRs can be found, e.g., on the website SOFAconventions.<sup>2</sup> This headphone decoding approach is classically using a small set of so-called *virtual loudspeakers*, as it is found in many places in technical literature, e.g. in the pioneering works of Jean-Marc Jot et al. [9] or Jérôme Daniel [10]. It is relevant in many important other works [18, 45, 46], the SADIE project,3 and it is employed in Sect. 1.4.2 on first-order Ambisonics.

*Coarse*. However, as outlined in some research papers [9, 18, 46], these approaches have in common that low-order Ambisonic synthesis is problematic. It can either happen when inserting a *dense grid* of virtual-loudspeaker HRIRs that the Ambisonic smoothing attenuates high-frequency at frontal and dorsal directions. Or, what had been the solution for a long time, a *coarse grid* of virtual-loudspeaker HRIRs does not attenuate high frequencies, but still yields that spatial quality strongly depends on the particular grid layout or orientation [46]. An early paper by Jot [9] proposed to remove the time delays of the HRIR before Ambisonic decomposition, and then to re-insert the otherwise missing interaural time-delay afterwards, for any sound panned in Ambisonics, which unfortunately yields an *object-based* panning system rather than a *scene-based* Ambisonic system.

*Dense*. Some dense-grid approaches propose to keep the HRIR time delays, or if formulated in the frequency domain: the HRTF phases (head-related transfer function), and hereby stay in a scene-based Ambisonic format, while correcting spectral deficiencies by diffuse-field or interaural-covariance equalization [18, 47]. Finally, most recent solutions proposed by Jin, Sun, and Epain, [17, 48] or Zaunschirm, Schörkhuber, and Höldrich [20, 21] modify the HRIR time delays/HRTF phases but only above, e.g., 3 kHz, without any object-based re-insertion afterwards. The omission of high-frequency interaural time-delay/phase information is a reasonable trade off done in favor of a more important accuracy in spectral magnitude.

*What does directional HRIR smoothing do to high frequencies?* The geometrical theory of diffraction [49] suggests that HRIRs must always contain at least the delay to the ear of either the shortest direct path or the shortest indirect path via the surface of the head. For a spherical head model with the radius R = 0.0875 m and speed of sound *<sup>c</sup>* <sup>=</sup> <sup>343</sup> <sup>m</sup> <sup>s</sup> , the Woodworth-Schlosberg formula [50] is composed of this

<sup>2</sup>https://www.sofaconventions.org.

<sup>3</sup>https://www.york.ac.uk/sadie-project/database.html.

**Fig. 4.25** Time delay to the ear depending on azimuth of horizontal sound (a) and (b) 360◦-measured KU100 dummy head HRIRs set from TH Köln (color displays dB levels)

consideration, see Fig. 4.24. The left ear receives a distant horizontal sound from the azimuth interval 0 <sup>≤</sup> <sup>φ</sup> <sup>≤</sup> <sup>π</sup> <sup>2</sup> as direct sound anticipated by <sup>τ</sup> = −<sup>R</sup> *<sup>c</sup>* sin φ, or for −π <sup>2</sup> < φ <sup>≤</sup> 0 as an indirect sound delayed by <sup>τ</sup> = −<sup>R</sup> *<sup>c</sup>* φ,

$$\pi(\phi) = -\frac{\mathbb{R}}{c} \text{ min}\left\{ \sin \phi, \,\phi \right\}, \tag{4.51}$$

as plotted in Fig. 4.25a, and recognizable from dummy-head measurements4 in Fig. 4.25b.

If the HRIR is smoothed across an angular range, the time-delay curve gets spread across time as well, see Fig. 4.26. In this way, depending on whether the smoothing uses a continuous or discrete set of directions, one either obtains something like a comb filter or a sinc-shaped frequency response. This smoothing is least disturbing around the direct-ear side as shown left in Fig. 4.26, and, as the indirect ear also encounters high-frequency shadowing effects, it is most disturbing mainly for frontal and rear sounds at 0◦ or 180◦, as shown right in Fig. 4.26. The corresponding frequency responses are roughly exemplified with what third-order Ambisonics equivalent smoothing would do to either 45◦-spaced HRIRs in Fig. 4.27a or 15◦-spaced ones in Fig. 4.27b.

<sup>4</sup>Data HRIR\_CIRC360.sofa from http://sofacoustics.org/data/database/thk.

**Fig. 4.26** Directionally smoothed playback to multiple HRIRs within a window, e.g. 30◦, causes different impulse response shapes at 90◦ and 0◦

**Fig. 4.27** Differences of directionally smoothed HRTF frequency responses for a horizontal sound from either 0◦ and 90◦, smoothed within a ±22.5◦ window, which roughly corresponds to the Ambisonics order N = 3; **a** and **b** either use a grid of 45◦ or 7.5◦ spaced HRIRs. The dashed line shows the the theoretical frequency limit 1.87 kHz for N = 3

To get an upper frequency limit, it is insightful to work in the frequency domain where the HRIR is denoted head-related transfer function (HRTF). A simplified linearized-phase version around <sup>φ</sup> <sup>=</sup> 0 uses <sup>τ</sup> <sup>≈</sup> <sup>R</sup> *<sup>c</sup>* φ, and the resulting in the Fourier transform with ω = 2π *f* is

$$H \approx e^{-i\frac{\omega}{\epsilon}\mathbb{R}\cdot\mathbf{r}(\phi)} = e^{i\frac{\omega}{\epsilon}\mathbb{R}\cdot\phi}.\tag{4.52}$$

To represent it by circular or spherical harmonics transformation limited to the order N, a maximum phase change represented by the harmonic *e*iN<sup>φ</sup> implies that we can only resolve the phase up to <sup>ω</sup> *<sup>c</sup>* R ≤ N, hence the range of accurate operation is limited in frequency

$$f\_{\mathcal{N}} \le \frac{c \cdot \mathcal{N}}{2\pi \,\mathrm{R}} = \mathrm{N} \cdot 624 \,\mathrm{Hz}.\tag{4.53}$$

As high-frequency HRTF phase evolves more rapidly over the angle as what the finite order can represent, this typically yields attenuation of the high frequencies when obtaining circular/spherical harmonics coefficients by transformation integral.

Directional smoothing of the discrete directional HRTFs causes relevant spectral problems, regardless of whether directional smoothing is done by Ambisonics, VBAP, MDAP. Mainly the geometric delay in the HRIRs is responsible for the emerging comb-filter or low-pass behavior. One could pull out the linear phase trend above the frequency limit and re-insert it, but is re-insertion necessary?

#### *4.11.1 High-Frequency Time-Aligned Binaural Decoding (TAC)*

As a pre-requisite for their binaural Ambisonic decoders, Schörkhuber et al. [21] tested, above which frequency the removal of the HRTF linear phase trend remains inaudible in direct HRTF-based rendering without panning or smoothing. In fact, most of their listeners could not distinguish the absence of the linear phase trend when removed above 3 kHz for various sound examples (drums, speech, pink noise, rendered at directions 10◦, −45◦, 80◦, −130◦). They had their subjects compare the result to a reference with unaltered HRTFs, and the result is analyzed in Fig. 4.28.

By this finding, it is possible to split up each of the 2 × 1 HRIRs *h*(*t*, *θ*) into an unaltered low-pass band and a time-aligned high-pass band to unify the highfrequency HRIR delay

$$\hat{h}(t,\theta) = h\_{\leq 3kHz}(t,\theta) + \begin{bmatrix} h\_{\text{left} > 3kHz}[t - \tau(\arcsin \theta\_{\text{y}}), \theta] \\ h\_{\text{right} > 3kHz}[t + \tau(\arcsin \theta\_{\text{y}}), \theta] \end{bmatrix}. \tag{4.54}$$

The time delay model τ (φ) uses the angle to the left/right ear on the positive/negative *y* axis, so arccos ±θy, but shifted by 90◦, hence φ = ± arcsin θy.

This removal allows use all available HRIRs of dense measurement sets for binaural synthesis of high accuracy, using a suitable linear Ambisonic decoder such as AllRAD. Assuming the resulting modified left and right HRIR for all directions are denoted as 2 × L matrix *H***ˆ** (*t*) = [*h***ˆ**(*t*, **θ**1), . . . , *h***ˆ**(*t*, **θ**L)] T, the 2 × (N + 1)<sup>2</sup> filter set for decoding every of the Ambisonic channels to the ears becomes:

$$
\hat{H}\_{\text{SH}}^{\text{T}}(t) = \hat{H}(t) \,\mathbf{D} \,\text{diag}\{\boldsymbol{\mathfrak{a}}\}.\tag{4.55}
$$

**Fig. 4.28** Experiment on audibility of removal of the linear phase trend from HRTF above a varied cutoff frequency from [21] showing medians and 95% confidence intervals

**Fig. 4.29** Exemplary horizontal cross-sections of linear (lin), time-aligned (ta), and MagLS/magnitude-least-squares (mls) of third order N = 3, compared to high-order N = 35 (max) Ambisonic left-ear HRTF representations of the TH Cologne HRIR\_L2720.sofa set

Results achieved by a pseudo-inverse decoding to hereby time-aligned HRIRs using R = 0.085 cm with N = 3 from the 2702-directions Cologne HRIRs<sup>5</sup> is shown in Fig. 4.29. The resulting polar patterns (ta) clearly outperform the linear decomposition (lin) at frequencies above 2kHz in representing the original HRTFs (max).

#### *4.11.2 Magnitude Least Squares (MagLS)*

Alternative to high-frequency time delay disposal, Schörkhuber et al. present an optimum-phase approach [21] that disregards phase match in favor of an improved magnitude match above cutoff. Formulated exemplarily for the left ear, across every HRTF direction **θ***<sup>l</sup>* , and for every discrete frequency ω*<sup>k</sup>* , with *hl*,*<sup>k</sup>* = *h*(**θ***l*, ω*<sup>k</sup>* ), this becomes

$$\min\_{\hat{h}\_{\text{SH},k}} \sum\_{l=1}^{L} \left[ |\mathbf{y}\_{\text{N}}(\boldsymbol{\theta}\_{l})^{\text{T}}\hat{h}\_{\text{SH},k}| - |h\_{l,k}| \right]^{2}. \tag{4.56}$$

Typically, one would need to solve magnitude least squares or magnitude squares least squares tasks with semidefinite relaxation, see Kassakian [51].

In practice, however, results turn out to be perfect already with an iterative combination of the reconstructed phase φˆ*<sup>l</sup>*,*k*−<sup>1</sup> from the previous frequency ω*<sup>k</sup>*−<sup>1</sup> with the HRTF magnitude |*hk*,*<sup>l</sup>*| of the current frequency ω*<sup>k</sup>* , before a linear decomposition thereof into spherical harmonic coefficients *h***ˆ** SH,*<sup>k</sup>* .

Every frequency below cutoff ω*<sup>k</sup>* < 2π *f*<sup>N</sup> just uses the linear least-squares spherical harmonics decomposition with the left-inverse of the spherical harmonics *Y* <sup>N</sup> sampled at the HRTF measurement nodes,

<sup>5</sup>Data HRIR\_L2702.sofa from http://sofacoustics.org/data/database/thk.

$$
\hat{h}\_{\rm SH,k} = (Y\_{\rm N}^{\rm T} Y\_{\rm N})^{-1} Y\_{\rm N}^{\rm T} \left[ h\_{l,k} \right]\_l. \tag{4.57}
$$

Continuing with the first frequency above/equal to cutoff ω*<sup>k</sup>* ≥ 2π *f*N, the algorithm proceeds as:

$$
\hat{\phi}\_{l,k-1} = \angle \left\{ \mathbf{y}\_{\text{N}}(\boldsymbol{\theta}\_{l})^{\text{T}} \,\hat{\mathbf{h}}\_{\text{SH},k-1} \right\} \,, \tag{4.58}
$$

$$\tilde{h}\_{\rm SH,k} = (Y\_{\rm N}^{\rm T} Y\_{\rm N})^{-1} Y\_{\rm N}^{\rm T} \left[ |h\_{l,k}| \, e^{i\phi\_{l,k-1}} \right]\_l \,, \tag{4.59}$$

and then moves to the next frequency *k* ← *k* + 1. The results are typically transformed back to time domain to get a real-valued impulse response for every spherical harmonic to the regarded ear.

The results of the MagLS approach (mls) outperform the time-alignment approach (ta) in the exemplary results shown for N = 3 in Fig. 4.29, in particular at the highest frequencies, where sphere-model-based delay simplification is not sufficiently helpful, anymore.

#### *4.11.3 Diffuse-Field Covariance Constraint*

Also for both the above approaches that modify the high-frequency phase, Zaunschirm et al. [20] note that low order rendering degrades envelopment in diffuse fields, so that they introduce an additional covariance constraint as defined by Vilkamo [22]. It can be implemented as a 2 × 2 filter matrix equalizing the resulting frequencydomain diffuse-field covariance matrix to the one of the original HRTF datasets. On the main diagonal, this covariance matrix shows the diffuse-field ear sensitivities (left and right), and off-diagonal it contains the diffuse-field inter-aural cross correlation.

At every frequency, the 2 × 2 diffuse-field covariance matrix of the original, very-high-order spherical harmonics HRTF dataset *H*<sup>H</sup> SH of the dimensions 2 × (M + 1)<sup>2</sup> with (M N) is given by

$$\mathcal{R} = H\_{\text{SH}}^{\text{H}} H\_{\text{SH}}.\tag{4.60}$$

The derivation why this inner product of spherical harmonic coefficients represents the diffuse-field covariance is given in Appendix A.5. The low-order high-frequency modified HRTF coefficient set *H***˜** SH of the dimensions 2 × (N + 1)<sup>2</sup> also has a 2 × 2 covariance matrix *R***ˆ** that will differ from the more accurate *R*,

$$
\hat{\mathbf{R}} = \hat{\boldsymbol{H}}\_{\text{SH}}^{\text{H}} \hat{\mathbf{H}}\_{\text{SH}}.\tag{4.61}
$$

Its diffuse-field reproduction improves after equalizing *R* = *R***ˆ** by a 2 × 2 filter matrix,

**Fig. 4.30** Covariance constraint filters enhance the binaural decorrelation of MagLS by negative crosstalk *M*<sup>12</sup> and *M*21, under corresponding correction of the diffuse-field sensitivities *M*<sup>11</sup> and *M*<sup>22</sup> at playback orders N < 3

$$
\hat{H}\_{\text{SH,corr}} = \hat{H}\_{\text{SH}} \mathbf{M}.\tag{4.62}
$$

Appendix A.5 shows the derivation of *M* based on [20, 22]. In summary, it is composed of factors obtained by Cholesky and SVD matrix decompositions

$$
\hat{H}\_{\text{corr,SH}} = \hat{H}\_{\text{SH}} \hat{X}^{-1} V U^{\text{H}} X,\tag{4.63}
$$

$$
\text{Choelsky factors:}
\qquad
\text{SVD:}
$$

$$
\begin{aligned}
H\_{\text{SH}}^{\text{H}} H\_{\text{SH}} &= X^{\text{H}} X, \quad \hat{X}^{\text{H}} X = U S V^{\text{H}},\\ \hat{H}\_{\text{SH}}^{\text{H}} \hat{H}\_{\text{SH}} &= \hat{X}^{\text{H}} \hat{X}.\end{aligned}
$$

While MagLS binaural decoding with orders higher than 2 or 3 does not require covariance correction, the correction enhances the decorrelation of the ear signals for 1st to 2nd order reproduction, as shown in Fig. 4.30.

#### **4.12 Practical Free-Software Examples**

#### *4.12.1 Pd and Circular/Spherical Harmonics*

Similar as in the example section on first-order encoding and decoding in pure data (Pd), Fig. 4.31 shows 3rd-order 2D Ambisonic encoding and decoding for an octagon loudspeaker layout. The implementation [mtx\_circular\_harmonics] of the circular harmonics is used from the iemmatrix library, and the numbers for <sup>180</sup> <sup>π</sup> = 57.29 and *am* <sup>=</sup> cos <sup>π</sup>*<sup>m</sup>* <sup>2</sup>·(N+1) were pre-calculated. Note the similarity to the first-order 2D example of Fig. 1.13, to which the main change is the use of the circular harmonics matrix object.

For decoding to headphones, programming in Pd also looks rather similar as in the first-order example in Fig. 1.14, only more HRIRs matching the respective

**Fig. 4.31** 2D encoding and decoding in Pd using [mtx\_circular\_harmonics] with 3rd order, 8 equidistant loudspeakers, and max-*r*<sup>E</sup> weighted decoder

loudspeaker positions need to be employed. To work in 3 dimensions, programming in Pd would also be similar as in the corresponding first-order example of Fig. 1.15, using the matrix object [mtx\_spherical\_harmonics]. Typically, pre-calculated decoders including AllRAD and max-*r*<sup>E</sup> are used and loaded by, e.g., [mtx D.mtx] into Pd to keep programming simple.

#### *4.12.2 Ambix Encoder, IEM MultiEncoder, and IEM AllRADecoder*

For encoding single- or multi-channel signals into Ambisonics, there are the ambix\_encode\_o<N>, or ambix\_encode\_i<L>\_o<N> VST plugins available from Kronlachner's ambix plugin suite or the IEM MultiEncoder from the IEM plugin suite. As exemplarily shown in Fig. 4.32, the multi encoder allows to encode channel-based multi-channel audio material, where *channel-based* [52] typically refers to each channel of the multi-channel material meant to be played back on a separate loudspeaker of clearly defined direction, cf. [44]. Elsewhere, the embedding of virtual playback directions can also be found referred to as *beds* or *virtual panning spots*.

**Fig. 4.32** MultiEncoder plug-in: encoding of a 4 + 5 + 0 recording

**Fig. 4.33** AllRADecoder plug-in: 5 + 7 + 0 layout from IEM Production Studio

**Fig. 4.34** AllRADecoder plug-in: 5 + 7 + 0 layout from IEM Production Studio

**Fig. 4.35** AllRADecoder plug-in: 5 + 7 + 0 layout from IEM Production Studio

The IEM AllRADecoder permits to manually enter or import the loudspeaker coordinates and channel indices, with the coordinates specified by the azimuth and elevation angle in degrees, as exemplified for the IEM production studio in Fig. 4.33. The figure also shows that just entering the pure 5 + 7 + 0 layout would produce an error message *Point of origin not within convex hull. Try adding imaginary loudspeakers.*

By adding an imaginary loudspeaker below whose signal is typically omitted, see Fig. 4.34, it becomes geometrically valid to calculate and employ the resulting decoder, however it is better to also insert an imaginary loudspeaker at the rear whose signal is preserved by specifying the gain value 1, as shown in Fig. 4.35.

#### *4.12.3 Reaper, IEM RoomEncoder, and IEM BinauralDecoder*

Particularly relevant for head-phone-based listening, rendering of anechoic sounds will typically not *externalize* well, as it does not match the mental expectation of ordinary listening environments [53–56]. To avoid that this would rather cause an in-head localization than the desired external sound image, one can, e.g., use the IEM RoomEncoder plugin, see Fig. 4.36. It is based on an image-source room model

**Fig. 4.36** RoomEncoder plug-in

**Fig. 4.37** BinauralDecoder plug-in

and encodes first-order wall-reflections involving reflection factors and propagation delays together with the desired direct sound.

The MagLS approach for Ambisonic decoding, using the KU100 measurements from Cologne Applied Science University and (optionally) their headphone equalization curves is implemented by the IEM BinauralDecoder, see Fig. 4.37.

In combination of both, IEM RoomEncoder and IEM BinauralDecoder with an Ambisonics-encoded single-channel sound (e.g. using ambix\_encoder), one can simply try to place the source and receiver together in the symmetry plane of the room, and then to slightly shift one of both sideways to see how externalization improves by slight asymmetry in the ear signals.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 5 Signal Flow and Effects in Ambisonic Productions**

*This system offers significantly enhanced spatialisation technologies […] with new creative possibilities opening up to anyone with the appropriate number of audio channels available on their computer systems.*

Dave G. Malham [1], ICMC, Bejing, 1999.

**Abstract** This chapter presents the internal working principles of various Ambisonic 3D audio effects. No matter which digital audio workstation or processing software is used in a production, the general Ambisonic signal infrastructure is outlined as an important overview of the signal processing chain. The effects presented are frequency-independent effects such as directional re-mapping (mirror, rotation, warping) and re-weighting (directional level modification), and frequency-dependent effects such as widening/distance/diffuseness, diffuse reverberation, and resolutionenhanced convolution reverberation.

The typical audio processing steps for Ambisonic surround-sound signal manipulation are shown in the block diagram Fig. 5.1 from [2]. The description of the multi-venue application in [3] and one for live effects [4] might be encouraging.

*Ambisonic encoding and Ambisonic bus*. From the previous section we know that representing single-channel signals *sc*(*t*) together with their direction *θ <sup>c</sup>* is a matter of encoding, of multiplying the signal by the coefficients *y*N(*θ <sup>c</sup>*) obtained by evaluating the spherical harmonics at the direction from which the signal should appear to come. In productions, there will be multiple signals, either representing *spot microphones*, *virtual playback spots* of embedded channel-based content (*beds*), e.g. stereo or 5.1, material. With all input signals encoded and summed up on an *Ambisonic bus*, we obtain the multi-channel Ambisonic signal representation of an entire audio production

**Fig. 5.1** Block diagram as in [2]

$$\mathbf{y}\_{\rm N}(t) = \sum\_{c=1}^{\rm C} \mathbf{y}\_{\rm N}(\theta\_c) \, \mathbf{s}\_c(t). \tag{5.1}$$

*Ambisonic surround-sound signal*. Without decoding to a specific loudspeaker layout, the signal *χ*<sup>N</sup> of the *Ambisonic bus* might appear somewhat virtual. Nevertheless, it allows to be drawn as a surround-sound signal *x*(*θ*, *t*) whose amplitude can be evaluated and metered at any direction *θ*, anytime *t*, using the expansion into spherical harmonics

$$\mathbf{x}(\boldsymbol{\theta},t) = \mathbf{y}\_{\mathrm{N}}^{\mathrm{T}}(\boldsymbol{\theta})\,\,\mathbf{y}\_{\mathrm{N}}(t). \tag{5.2}$$

*Upmixing*. As first-order recordings are not highly resolved, there are several works on algorithms with resolution enchancement strategies that re-assign time-frequency bins more sharply to directions. A good summary on such *input-specific insert effects* has been given in the book [5, 6]. Available solutions are DirAC, HOA-DirAC, COMPASS, Harpex.

*Higher order*. Higher-order microphones require more of the acoustic holophonic and holographic basics than presented above, yielding pre-processing filters as*inputspecific insert effect*. Higher-order recording is dealt with in the subsequent Chap. 6 after the derivation of the wave equation and the solutions in the spherical coordinate system.

*Insert effects: Generic re-mapping and leveling*. One can imagine that it should be possible to manipulate the surround-sound signal *x*(*θ*, *t*) in various ways. For instance, effects based on directional re-mapping can take signals out of their original directional range and place them back into the Ambisonic signal at manipulated directions. Also, directions can be altered in amplitude levels so that, for instance, signals at directions with unwanted content undergo attenuation. Many more useful effects are presented below.

*Decoding to loudspeakers/headphones*. To map the modified Ambisonic signal *χ***˜** <sup>N</sup>˜ to loudspeakers or headphones, an Ambisonic decoder is needed as discussed in the previous chapter. For decoding to headphones, it should be considered to either take only as few HRIR directions to decode to as possible [7, 8], before signals get convolved and mixed to avoid coloration at frontal directions where delays in the HRIRs change too strongly over the direction to get resolved properly [9, 10]. Alternatively, the approach in [11] proposed removal of the HRIR delay at high frequencies and diffuse-field covariance equalization by a 2 × 2 filter system, cf. Sect. 4.11.

#### **5.1 Embedding of Channel-Based, Spot-Microphone, and First-Order Recordings**

Microphone arrays for near-coincident higher-order Ambisonic recording based on holography will be discussed in the subsequent chapter. Nevertheless it possible to use (i) spot and close microphones and encode their direction into the directional panorama, (ii) first-order microphone arrays to fill the Ambisonic channels only up to the first order, (iii) more classical non-coincident or equivalence-stereophonic microphone arrays whose typical playback directions are encoded in Ambisonics.

The study by Kurz et al. [12] investigated how recordings by first-order encoding of the soundfield microphone ST450 and the Oktava MK4012 tetrahedral microphone arrays compare to the equivalence-stereophonic ORTF, see Fig. 5.2. In addition, ORTF-like mapping of the Oktava MK4012's frontal signals to the ±30◦ directions in 5th order was tested instead of its first-order encoding. Figure 5.3 shows the results of the study in terms of the perceptual attributes localization and spatial depth. It seems that a mixture between ORTF-like 5th-order encoding and first-order encoding of the MK4012 microphone achieves preferred results, while the first-order encoded output of the ST450 Soundfield microphone is rated fair in both attributes, the ORTF microphone only ranked well terms of localization. The results of the ST450 were independent from its orientation, whereas the localization of the first-order-encoded MK4012 was found to be dependent on the orientation. This dependency of the MK4012 is because its microphones are not sufficiently coincident.

As a bottom line of the detailed analysis, one should be encouraged to keep using classical microphone techniques where known to be appropriate and encode their output in higher-order beds or virtual playback directions. However, this should be done with the awareness that stereophonic recording won't necessarily work for a

**Fig. 5.2** Ensemble of the Ambisonic and reference microphones of the study by Kurz et al. [12]; the pixelized microphone prototype by AKG was excluded from the study

**Fig. 5.3** Median values and 95% confidence intervals for each attribute from experiments in [12] for different microphones, orientations, and playback processing

large audience area, for which the robustness in directional mapping of equivalencebased techniques seem to be attractive.

An interesting layout is, e.g., specified in Hendrickx et al's work [13], in which they use an equivalence-stereophonic six-channel microphone array. Another interesting idea was used in the ICSA Ambisonics Summer School 2017. A height layer of suitably inclined super-cardioid microphones was added at small vertical distance to the horizontal microphone layer, similarly as the upwards-pointing directional microphones suggested in Lee's and Wallis' work [14, 15] to provide sufficiently attenuated horizontal sounds to the height layer.

**Fig. 5.4** Median values and 95% confidence intervals of listening experiment comparing channelbased orchestra recordings on headphone playback, either directly rendered using the corresponding HRIRs or via binaural Ambisonic decoding of different orders

*Binaural rendering study using surround-with-height material*. In another study by Lee, Frank, and Zotter [16], static headphone-based rendering of channel-based recordings was compared using direct HRIR-based rendering or Ambisonics-based binaural rendering, cf. Sect. 4.11. The aim was to find whether differently recorded material could be rendered at high quality via binaural Ambisonics renderers, or under which settings this would imply quality degradation when compared to channel-based binaural rendering.

The results from the half of the listening experiment done in Graz is analyzed in Fig. 5.4, and the renderers compared were channel-based "ref", a low-passed mono anchor designed to have poor quality "0", a first-order binaural Ambisonic renderer "1c" based on a cube layout with loudspeakers at ±90◦, ±270◦ azimuth and ±35.3◦ elevation, and MagLS binaural Ambisonic renderers at the orders "1", "2", "3", "4", and "5". Obviously, for orders 2 and above, there is not much quality degradation compared to the reference channel-based binaural rendering. The spatial quality cannot be distinguished from the reference for MagLS with Ambisonic orders 3 and above, and the timbral qualities cannot be distinguished for Ambisonic orders 2 and above.

While this result simplifies the practical requirements for headphone playback remarkably, it can be supposed that due to the limited sweet spot size, loudspeaker playback would still require higher orders, typically.

#### **5.2 Frequency-Independent Ambisonic Effects**

Many frequency- and time-independent Ambisonic effects are based on the aforementioned re-mapping of directions and manipulation of directional amplitudes, see e.g. Kronlachner's thesis, [2, 17]; advanced effects can be found in [18]. In general, the surround-sound signal allows to be manipulated by any thinkable transformation that modifies the directional mapping and amplitude of its contents. The formulation

$$
\tilde{\boldsymbol{\chi}}(\theta, t) = \mathbf{g}(\theta) \, \boldsymbol{\chi}(\theta, t) \tag{5.3}
$$

expresses an operation that is able to pick out every direction *θ* of the input signal, weight its signal by a directional gain *g*(*θ*), and re-map it to a new direction **˜** *θ* = *τ* {*θ*} within a transformed signal *x*˜. To find out how this affects Ambisonic signals, we write both *x* and *x*˜ as Ambisonic signals *x*(*θ*, *t*) = *y*<sup>T</sup> <sup>N</sup>(*θ*) *χ*N(*t*) and *x*˜(*θ*, *t*) = *y*<sup>T</sup> N˜ (**˜** *θ*) *χ***˜** <sup>N</sup>(*t*) expanded in spherical/circular harmonics,

$$\mathbf{y}\_{\tilde{\mathbf{N}}}^{\mathrm{T}}(\tilde{\boldsymbol{\theta}})\,\tilde{\boldsymbol{\chi}}\_{\tilde{\mathbf{N}}}(t) = \operatorname{g}(\boldsymbol{\theta})\,\mathbf{y}\_{\mathrm{N}}^{\mathrm{T}}(\boldsymbol{\theta})\,\boldsymbol{\chi}\_{\mathrm{N}}(t),$$

and use *<sup>S</sup>*<sup>D</sup> *y*N˜ (**˜** *θ*) *y*<sup>T</sup> N˜ (**˜** *θ*) d **˜** *θ* = *I* by integrating over *y*N˜ (**˜** *θ*)d **˜** *θ <sup>S</sup>*<sup>D</sup> to get *χ***˜** <sup>N</sup>˜ (*t*) on the left

$$\tilde{\chi}\_{\tilde{\mathbf{N}}}(t) = \overbrace{\int\_{S\_{\mathbf{0}}} \mathbf{y}\_{\tilde{\mathbf{N}}}(\tilde{\boldsymbol{\theta}}) \, \tilde{\mathbf{y}}(\boldsymbol{\theta}) \, \mathbf{y}\_{\mathbf{N}}^{\mathrm{T}}(\boldsymbol{\theta}) \, \mathrm{d}\tilde{\boldsymbol{\theta}} \, \ \boldsymbol{\chi}\_{\mathbf{N}}(t) = T \, \boldsymbol{\chi}\_{\mathbf{N}}(t) \tag{5.4}$$

to find the transformed signals being just re-mixed Ambisonic input signals by the matrix *T* (note that it might require an increased Ambisonic order N). Numerical ˜ evaluation of the matrix *T* = *<sup>S</sup>*<sup>D</sup> *y*N˜ (**˜** *θ*) *g*(*θ*) *y*N˜ (*θ*) d **˜** *θ* is best done by using a highenough *t*-design = [θ*l*] to discretize the integration variable **˜** *θ* = *τ* {*θ*}. For the discretized input directions *θ*, an inverse mapping *θ* = *τ* <sup>−</sup>1{**˜** *θ*} of the output direction must exist (directional re-mapping must be bijective), so that we can write

$$T = \int\_{\mathbb{S}\_{\mathbb{D}}} \mathbf{y}\_{\mathbb{N}}(\tilde{\boldsymbol{\theta}}) \, \mathbf{g}\left(\boldsymbol{\pi}^{-1}\langle\tilde{\boldsymbol{\theta}}\rangle\right) \mathbf{y}\_{\mathbb{N}}^{\mathrm{T}}(\boldsymbol{\pi}^{-1}\langle\tilde{\boldsymbol{\theta}}\rangle) \, \mathrm{d}\tilde{\boldsymbol{\theta}} = \frac{4\pi}{\hat{\mathbf{L}}} \, \mathbf{Y}\_{\mathbb{N},\mathbf{\Theta}} \, \mathrm{diag}\{\mathbf{g}\_{\mathbf{r}^{-1}\langle\boldsymbol{\Theta}\rangle}\} \, \mathbf{Y}\_{\mathbb{N},\mathbf{r}^{-1}\langle\boldsymbol{\Theta}\rangle}^{\mathrm{T}}.$$

This formalism is generic and covers simplistic and more complex tasks. It helps understanding that every frequency-independent directional weighting and/or remapping is just re-mixing the Ambisonic signals by a matrix, as in Fig. 5.6a.

The ambix VST plugin suite implements several effects, e.g. in the VST plugins ambix\_mirror, ambix\_rotate, ambix\_directional\_loudness, ambix\_warp. The sections below explain how these and other effects work inside.

#### *5.2.1 Mirror*

Mirroring does not actually require the generic re-mapping and re-weighting formalism from above, yet. The spherical harmonics associated with the Ambisonic channels are shown in Fig. 4.12 and upon closer inspection one recognizes their symmetries, see Fig. 5.5. To mirror the Ambisonic sound scene with regard to planes of symmetry, it is sufficient to sign-invert channels associated with odd-symmetric spherical harmonics as in Fig. 5.6b. Formally, the transform matrix consists of a diagonal matrix *T* = diag{*c*} only, with the corresponding sign-change sequence *c*.

**Fig. 5.5** Ambisonic singals associated with odd symmetric spherical harmonics are sign-inverted to mirror the sound scene. For every Cartesian axis, illustrations above show spherical harmonics up to the third-order, with the order index *n* organized in rows and the mode index *m* in columns. Even harmonics are blurred for visual distinction

*Up-down:* For instance, spherical harmonics with |*m*| = *n* are even symmetric with regard to *z* = 0 (up-down), and from this index on, every second harmonic in *m* is. To flip up and down, it is therefore sufficient to invert the signs of odd-symmetric spherical harmonics with regard to *z* = 0; they are characterized by *n* + *m* being an odd number, or *cnm* = (−1)*<sup>n</sup>*+*<sup>m</sup>*.

*Left-right:*The sin ϕ-related spherical harmonics with *m* < 0 are odd-symmetric with regard to *y* = 0 (left-right); therefore sign-inverting the signals with the index *m* < 0 exchanges left and and right in the Ambisonic surround signal, i.e. *cnm* = (−1)*<sup>m</sup>*<0.

**Fig. 5.6** Block diagrams of frequency-independent transformations such as re-mapping and reweighting (left, matrix operations), or mirroring (right, sign-only operations)

*Front-back:* Every odd-numbered *m* > 0 is odd-symmetric with regard to *x* = 0 (front-back), and so is every even-numbered harmonic with *m* < 0. Inverting the sign of these harmonics, *cnm* = (−1)*<sup>m</sup>*+(*m*<0) , flips front and back in the Ambisonic surround signal.

#### *5.2.2 3D Rotation*

Rotation can be expressed by a general rotation matrix *R* consisting of a rotation around *z* by χ, around *y* by ϑ, and again around *z* by ϕ, see Fig. 5.7. This rotation matrix maps every direction *θ* to a rotated direction **˜** *θ*:

$$\begin{aligned} \tilde{\boldsymbol{\theta}} &= \boldsymbol{R}(\boldsymbol{\varphi}, \boldsymbol{\vartheta}, \boldsymbol{\chi}) \,\boldsymbol{\theta}, \\ \boldsymbol{\mathcal{R}} &= \begin{bmatrix} \cos(\boldsymbol{\varphi}) - \sin(\boldsymbol{\varphi}) \, \boldsymbol{0} \\ \sin(\boldsymbol{\varphi}) \quad \cos(\boldsymbol{\varphi}) & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{0} & 1 \end{bmatrix} \begin{bmatrix} \cos(\boldsymbol{\vartheta}) \, \boldsymbol{0} - \sin(\boldsymbol{\vartheta}) \\ \boldsymbol{0} & 1 & \boldsymbol{0} \\ \sin(\boldsymbol{\vartheta}) \, \boldsymbol{0} & \cos(\boldsymbol{\vartheta}) \end{bmatrix} \begin{bmatrix} \cos(\boldsymbol{\chi}) - \sin(\boldsymbol{\chi}) \, \boldsymbol{0} \\ \sin(\boldsymbol{\chi}) \, \cos(\boldsymbol{\chi}) & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{0} & 1 \end{bmatrix}. \end{aligned}$$

Using this as a transform rule *τ* (**˜** *θ*) = *R θ* with neutral gain *g*(*θ*) = 1, we find the transform matrix by the inverse mapping *<sup>θ</sup>* <sup>=</sup> *<sup>R</sup>*<sup>T</sup> **˜** *θ* as

**Fig. 5.7** *zyz*-Rotation on the plain example of great-circle navigation of a paper plane around the earth. With the original location at the zenith, a first rotation around *z* determines the course, and the subsequent rotations around *y* and *z* relocate the plane in zenith and azimuth

$$T = \frac{4\pi}{\hat{L}} \, Y\_{\text{N},\Theta} \, \mathbf{Y}\_{\text{N},\mathcal{R}^{\text{T}}\Theta}^{\text{T}}.\tag{5.6}$$

Using the L directions of a ˆ *t* ≥ 2N-design is sufficient to sample the harmonics accurately. With the resulting *T*, rotation is implemented as in Fig. 5.6a.

There is plenty of potential for simplification: As only the spherical harmonics of a given order *n* are required to re-express a rotated spherical harmonic of the same order *n*, *T* is actually block diagonal *T* = blk diagn{*Tn*}, and within each spherical harmonic order, the integral could be more efficiently evaluated using a smaller *t* ≥ 2*n*-design. Moreover, there are various fast and recursive ways to calculate the entries of *T* as in [19–25] and implemented in most plugins. And yet, in practice a naïve implementation can be fast enough and pragmatic.

*Rotation around z*. One special case of rotation is important and particularly simple to implement. A directional encoding in azimuth always either is equal to *m*(ϕs) in 2D, or contains it in 3D. For *m* > 0, the azimuth encoding *m*(ϕs) depends on cos *m*ϕs, and its negative-sign version −*<sup>m</sup>*(ϕs) depends on sin(|*m*|ϕs). The encoding angle can be offset by the trigonometric addition theorems. They can be written as a matrix:

$$
\begin{bmatrix}
\sin m(\varphi\_s + \varphi) \\
\cos m(\varphi\_s + \varphi)
\end{bmatrix} = \underbrace{\begin{bmatrix}
\cos m\varphi & \sin m\varphi \\
\end{bmatrix}}\_{R(m\varphi)} \begin{bmatrix}
\sin m\varphi\_s \\
\cos m\varphi\_s
\end{bmatrix}.
\tag{5.7}
$$

By this, any Ambisonic signal, be it 2D or 3D, can be rotated around *z* by the matrices *R*(*m*ϕ) for the signal pairs with ±*m*.

$$
\begin{bmatrix}
\Phi\_{-m}(\varphi\_{\mathrm{s}}+\varphi) \\
\Phi\_{m}(\varphi\_{\mathrm{s}}+\varphi)
\end{bmatrix} = \mathbf{R}(m\varphi) \begin{bmatrix}
\Phi\_{-m}(\varphi\_{\mathrm{s}}) \\
\Phi\_{m}(\varphi\_{\mathrm{s}})
\end{bmatrix},
$$

$$
\begin{bmatrix}
Y\_{n}^{-m}(\varphi\_{\mathrm{s}}+\varphi,\vartheta\_{\mathrm{s}}) \\
Y\_{n}^{m}(\varphi\_{\mathrm{s}}+\varphi,\vartheta\_{\mathrm{s}})
\end{bmatrix} = \mathbf{R}(m\varphi) \begin{bmatrix}
Y\_{n}^{-m}(\varphi\_{\mathrm{s}},\vartheta\_{\mathrm{s}}) \\
Y\_{n}^{m}(\varphi\_{\mathrm{s},\vartheta\_{\mathrm{s}}})
\end{bmatrix}.
\tag{5.8}
$$

Figure 5.8a shows the processing scheme implementing only the non-zero entries of the associated matrix operation *T*. Combined with a fixed set of 90◦ rotations around *y* (read from files), it can be used to access all rotational degrees of freedom in 3D [20].

The rotation effect is one of the most important features when using head-tracked interactive VR playback for headphones. Here, rotation counteracting the head movement has the task to support the impression of a static image of the virtual outside world.

#### *5.2.3 Directional Level Modification/Windowing*

What might be most important when mixing is the option to treat the gains of different directions differently: it might be necessary to attenuate directions of uninteresting or disturbing content while boosting directions of a soft target signal. For such a

**Fig. 5.8** Rotation around *z* and Ambisonic widening/diffuseness apply simple 2 × 2 rotation matrices/filter matrices to each Ambisonic signal pair χ*n*,*<sup>m</sup>* χ*n*,−*<sup>m</sup>* of the same order *n*. Note that the order of the input/output channels plotted is not the typical ACN sequence to avoid crossing connections and hereby simplify the diagram

manipulation there is a neutral directional re-mapping **˜** *θ* = *θ* and the transform to define the matrix *T* that is implemented as in Fig. 5.6a remains

$$T = \frac{4\pi}{\hat{L}} \operatorname{Y}\_{\hat{\mathsf{N}},\mathsf{\Theta}} \operatorname{diag} \{ \mathbf{g}\_{\boldsymbol{\Theta}} \} \operatorname{Y}\_{\mathrm{N},\boldsymbol{\Theta}}^{\mathrm{T}}.\tag{5.9}$$

In the simplest version, as implemented in ambix\_directional\_loudness, the gain function just consists of two mutually exclusive regions, e.g. within a region of diameter α around the direction *θ* g, and a complementary region outside, with separately controlled gains *g*in and *g*out:

$$g(\boldsymbol{\theta}) = g\_{\text{in}} \boldsymbol{u}(\boldsymbol{\theta}^{\mathrm{T}} \boldsymbol{\theta}\_{\text{g}} - \cos \frac{\boldsymbol{\alpha}}{2}) + g\_{\text{out}} \boldsymbol{u}(\cos \frac{\boldsymbol{\alpha}}{2} - \boldsymbol{\theta}^{\mathrm{T}} \boldsymbol{\theta}\_{\text{g}}),\tag{5.10}$$

where *u*(*x*) represents the unit-step function that is 1 for *x* ≥ 0 and 0 else. Note that the Ambisonic order of this effect will need to be larger to be lossless. However, with reasonably chosen sizes α and gain ratios *g*in/*g*out, the effect will nevertheless produce reasonable results. Figure 5.9 shows a window at azimuth and elevation at 22.5◦ with an aperture of 50◦ using *g*in = 1 and *g*out = 0 and the order of N = 10 with a grid of encoded directions to illustrate the influence of the transformation.

*For reference: entries of the tensor used to analytically re-expand the product of two spherical functions x*(*θ*) *g*(*θ*) *given by their spherical harmonic coefficients* χ*nm*, γ*nm are called Gaunt coefficients or Clebsh-Gordan coefficients* [6, 26].

**Fig. 5.9** Directionally windowed Ambisonic test image at every 90◦ in azimuth, interleaved in azimuth for the elevations <sup>±</sup>60◦ and <sup>±</sup>22.5◦, using the order N <sup>=</sup> <sup>N</sup>˜ <sup>=</sup> 10, a window size of <sup>α</sup> <sup>2</sup> = 50◦ around azimuth and elevation of 0◦ and max-*r*<sup>E</sup> weighting

#### *5.2.4 Warping*

Gerzon [27, Eq. 4a] described the effect *dominance* that is meant to warp the Ambisonic surround scene to modify how vitally the essential parts in front of the scene are presented.

*Warping wrt. a direction.* For mathematical simplicity, we describe this bilinear warping with regard to the *z* direction. To warp with regard to the frontal direction, one first rotates the front upwards, applies the warping operation there, and then rotates back. The bilinear warping modifies the normalized *z* coordinate ζ = cos ϑ = θ<sup>z</sup> so that signals from the horizon ζ = 0 are pulled to ζ˜ = α,

$$
\tilde{\zeta} = \frac{\alpha + \xi}{1 + \alpha \xi},
\tag{5.11}
$$

while keeping for the poles ζ˜ = ±1 what was originally there ζ = ±1. Hereby, the surround signal gets squeezed towards or stretched away from the zenith, or when rotating before and after: towards/from any direction.

The integral can be discretized and solved by a suitable *t*-design as before, only that for lossless operation, the output order N must be higher than the input order N. ˜ We get a matrix *T* that is implemented as in Fig. 5.6a is computed by

*<sup>T</sup>* <sup>=</sup> <sup>4</sup><sup>π</sup> <sup>L</sup><sup>ˆ</sup> *<sup>Y</sup>* <sup>N</sup>˜ , diag{*g<sup>τ</sup>* <sup>−</sup>1{}} *<sup>Y</sup>***˜** <sup>T</sup> <sup>N</sup>,*<sup>τ</sup>* <sup>−</sup>1{}. (5.12)

The inverse mapping yields

$$
\zeta = \pi^{-1}(\tilde{\xi}) = \frac{\hat{\xi} - \alpha}{1 - \alpha \hat{\xi}},
\tag{5.13}
$$

and it modifies the coordinates of the *t*-design inserted for **˜** *θ<sup>l</sup>* = **θ***<sup>l</sup>* = [θ<sup>x</sup>,*<sup>l</sup>*, θ<sup>y</sup>,*<sup>l</sup>*, θ<sup>z</sup>,*<sup>l</sup>*] T with ζ˜ *<sup>l</sup>* = θ<sup>z</sup>,*<sup>l</sup>* accordingly

**Fig. 5.10** Warping of the horizontal plane by 22.5◦ downwards; original Ambisonic test image contains points at every 90◦ in azimuth, interleaved in azimuth for the elevations ±60◦ and ±22.5◦; orders are N = N˜ = 10; max-*r*<sup>E</sup> weighted

$$\mathbf{r}^{-1}\{\boldsymbol{\Theta}\_{l}\} = \begin{bmatrix} \boldsymbol{\theta}\_{\mathbf{x},l}\sqrt{1 - (\boldsymbol{\tau}^{-1}\{\tilde{\xi}\_{l}\})^{2}}\\ \boldsymbol{\theta}\_{\mathbf{y},l}\sqrt{1 - (\boldsymbol{\tau}^{-1}\{\tilde{\xi}\_{l}\})^{2}}\\ \boldsymbol{\tau}^{-1}\{\tilde{\xi}\_{l}\} \end{bmatrix} = \begin{bmatrix} \boldsymbol{\theta}\_{\mathbf{x},l}\sqrt{1 - \left(\frac{\boldsymbol{\theta}\_{\mathbf{x},l} - \boldsymbol{\alpha}}{1 - a\boldsymbol{\theta}\_{\mathbf{x},l}}\right)^{2}}\\ \boldsymbol{\theta}\_{\mathbf{y},l}\sqrt{1 - \left(\frac{\boldsymbol{\theta}\_{\mathbf{x},l} - \boldsymbol{\alpha}}{1 - a\boldsymbol{\theta}\_{\mathbf{x},l}}\right)^{2}}\\ \frac{\boldsymbol{\theta}\_{\mathbf{x},l} - \boldsymbol{\alpha}}{1 - a\boldsymbol{\theta}\_{\mathbf{x},l}} \end{bmatrix}.\tag{5.14}$$

The gain *g*(ζ )ˆ of the generic transformation is useful to preserve the loudness of what becomes wider and therefore louder in terms of the *E* measure after re-mapping. To preserve loudness, the resulting surround signal is divided by the square root of the stretch applied, which is related to the slope of the mapping by <sup>1</sup> *g* = d<sup>ζ</sup> <sup>d</sup>ζ˜ . Expressed as de-emphasis gain, we get

$$\log\left(\hat{\xi}\_{l}\right) = \frac{1-\alpha\bar{\xi}\_{l}}{\sqrt{1-\alpha^{2}}} = \frac{1-\alpha\theta\_{l,x}}{\sqrt{1-\alpha^{2}}}.\tag{5.15}$$

Figure 5.10 shows warping of the horizontal plane by 20◦ downwards, using the test image parameters as with windowing; de-emphasis attenuates widened areas.

In the same fashion, Kronlachner [17] describes another warping curve that warps with regard to fixed horizontal plane and pole, either squeezing or stretching the content towards or away from the horizon, symmetrically for both the upper and lower hemispheres (second option of the ambix\_warp plugin).

#### **5.3 Parametric Equalization**

There are two ways of employing parametric equalizers to Ambisonic channels. Either a single-/multi-channel input of a mono-encoder or a multiple-input encoder is filtered by parametric equalizers. Or each of the Ambisonic signal's channels is filtered by the same parametric equalizer, see Fig. 5.11a.

**Fig. 5.11** Block diagram of processing that commonly and equally affects all Ambisonic signals, such as parametric equalization and dynamic processing (compression), without recombining the signals

Bass management is often important to not overdrive smaller loudspeaker systems of, e.g., a 5th-order hemispherical playback system with subwoofer signals: All 36 channels from the Ambisonic bus can be sent to a decoder section, in which frequencies below 70–100 Hz are high-cut by a 4th-order filter before running through the Ambisonic decoder, while the first channel from the Ambisonics bus alone, the omnidirectional channel, is being sent to a subwoofer section, in which a 4th-order filter high-cut removes the high frequencies above 70–100 Hz before the signal is sent to the subwoofers. If the playback system is time-aligned between subwoofer and higher frequencies, the 4th-order crossovers should be Linkwitz–Riley filters (either squared Butterworth high-pass or low-pass filters) to preserve phase equality [28].

For more information on parametric equalizers, the reader is referred to Udo Zölzer's book on Digital audio effects [29].

#### **5.4 Dynamic Processing/Compression**

Individual compression of different Ambisonic channels would destroy the directional consistency of the Ambisonics signal. Consequently, dynamic processing should rather affect the levels of all Ambisonic channels in the same way. As it typically contains all the audio signals, it is useful to have the first, omnidirectional Ambisonic channel control the dynamic processor as side-chain input, see Fig. 5.11b. For more information on dynamic processing, the reader is referred to Udo Zölzer's book on Digital audio effects [29].

Moreover, it is sometimes useful to compress the vocals of a singer separately. To this end, the directional compression would first extract a part of the Ambisonic signals by a directional window, creating one set of Ambisonic signals without the directional region of the window, and another one exclusively containing it. The compression is applied on the resulting window signal before re-combining it with the rest signals.

#### **5.5 Widening (Distance/Diffuseness/Early Lateral Reflections)**

Basic widening and diffuseness effects can be regarded as being inspired by Gerzon [30] and Laitinen [31] who proposed to apply frequency-dependent panning filters, mapping different frequencies to directions dispersed around the panning direction. The resulting effect is fundamentally different from and superior to increasing the spread in frequency-independent MDAP with enlarged spread or Ambisonics with reduced order, which could yield audible comb filtering.

To apply this technique to Ambisonics, Zotter et al. [32] proposed to employ a dispersive, i.e. frequency-dependent, rotation of the Ambisonic scene around the *z*axis as in Eq. (5.8) by the matrix *R* as described above and in Fig. 5.8b, using 2 × 2 matrices of filters to implement the frequency-dependent argument *m*φˆ cos ωτ

$$\mathbf{R}(m\hat{\phi}\cos\omega\mathbf{r}) = \begin{bmatrix} \cos(m\hat{\phi}\cos\omega\mathbf{r}) \sin(m\hat{\phi}\cos\omega\mathbf{r})\\ -\sin(m\hat{\phi}\cos\omega\mathbf{r}) \cos(m\hat{\phi}\cos\omega\mathbf{r}) \end{bmatrix},\tag{5.16}$$

whose parameters φˆ and τ allow to control the magnitude and change rate of the rotation with increasing frequency. How this filter matrix is implemented efficiently was described in [33], where a sinusoidally frequency-varying pair of functions

$$\mathcal{g}\_1(\omega) = \cos\left[\alpha\cos(\omega\,\tau)\right], \qquad \mathcal{g}\_2(\omega) = \sin\left[\alpha\cos(\omega\,\tau)\right], \tag{5.17}$$

was found to correspond to the sparse impulse responses in the time domain

$$g\_1(t) = \sum\_{q = -\infty}^{\infty} J\_{|q|}(\alpha) \cos(\frac{\pi}{2} |q|) \,\delta(t - q | \tau) \tag{5.18}$$

$$g\_2(t) = \sum\_{q = -\infty}^{\infty} J\_{|q|}(\alpha) \sin(\frac{\pi}{2} |q|) \,\delta(t - q | \tau),$$

allowing for truncation to just a few terms in *q*, typically 11 taps between −5 ≤ *q* ≤ 5 or fewer, and hereby an efficient implementation. For the implementation of the filter matrix, for each degree *m*, the value α = *m*φˆ. (*It might be helpful to be reminded of a phase-modulated cosine and sine from radio communication, whose spectra are the same functions as this impulse response pair*.)

As the algorithm places successive frequencies at slightly displaced directions, the auditory source width increases. Moreover, the frequency-dependent part causes a smearing of the temporal fine-structure in the signal. In [34], it was found that implementations discarding the negative values of *q*, i.e. keeping *q* ≥ 0 sound more natural and still exhibit a sufficiently strong effect. Time constants τ around 1.5 ms yield a *widening* effect, and a *diffuseness* and *distance* impression is obtained with τ around 15 ms. The parameter φˆ is adjustable between 0 (no effect) and larger values. Beyond 80◦ the audio quality starts to degrade. The use as diffusing effect has turned out to be useful as simple simulation of early lateral reflections, because most parts of the spectrum are played back near the reversal points ±φˆ of the dispersion contour. For naturally sounding early reflections, additional shelving filters introducing attenuation of high frequencies prove useful.

Figures 5.12 and 5.13 show experimental ratings of the perceived effect strength (width or distance) of the above algorithm in [34], which was implemented as frequency-dependent (dispersive) panning on just a few loudspeakers L = 3, 4, 5, 7 evenly arranged from −90◦ to 90◦ on the horizon at 2.5 m distance from the central listening position. Loudspeakers were controlled by a sampling decoder of the orders N = 1, 2, 3, 5 with the center of the max-*r*E-weighted panning direction at 0◦ in front. The signal was speech and as a reference it used the frontal loudspeaker with the unprocessed signal "REF". The experiment tested the algorithm with both the symmetric impulse responses suggested by Eq. (5.18), and such truncated to their causal *q* ≥ 0-side, for a listening position at the center of the arrangement (bullet marker) and at 1.25 m shifted to the right, off-center (square marker). Figure 5.12 indicates for the widening algorithm with τ = 1.5 ms that the perceived width saturates above N > 2 at both listening positions. Despite the effect of the causal-sided

**Fig. 5.12** Perceived width (left) and audio quality (right) of frequency-dependent dispersive Ambisonic rotation as widening effect using the setting τ = 1.5 ms, the Ambisonic orders N = 1, 2, 3, 5, and L = 3, 4, 5, 7 loudspeakers on the frontal semi-circle, with listening positions at the center (bullet marker) and half-way right off-center (square marker)

**Fig. 5.13** Perceived width (left) and audio quality (right) of frequency-dependent dispersive Ambisonic rotation as distance/diffuseness effect using the setting τ = 15 ms, the Ambisonic orders N = 1, 2, 3, 5, and L = 3, 4, 5, 7 loudspeakers on the frontal semi-circle, with listening positions at the center (bullet marker) and half-way right off-center (square marker)

implementation is weaker in effect strength, it highly outperforms the symmetric FIR implementation in terms audio quality (right diagram), while still producing a clearly noticeable effect when compared to the unprocessed reference (left diagram).

A more pronounced preference of the causal-sided implementation in terms of audio quality is found in Fig. 5.13 for the setting τ = 15 ms, where the algorithm is increasing the diffuseness or perceived distance for orders N > 2 at both listening positions.

#### **5.6 Feedback Delay Networks for Diffuse Reverberation**

Feedback delay networks (FDN, cf. [35, 36]) can directly be employed to create diffuse Ambisonic reverberation. A dense response and an individual reverberation for every encoded source can be expected when feeding the Ambisonic signals directly into the inputs of the FDN.

As in Fig. 5.14, FDNs consists of a matrix *<sup>A</sup>* that is orthogonal *<sup>A</sup>*<sup>T</sup> *<sup>A</sup>* <sup>=</sup> *<sup>I</sup>* and should mix the signals of the feedback loop well enough to distribute them across all different channels to couple the resonators associated with the different delays τ*<sup>i</sup>* . These delays should not have common divisors to avoid pronounced resonance frequencies, and are therefore typically chosen to be related to prime numbers. Small delays are typically selected to be more closely spaced {2, 3, 5, ...} ms to simulate a diffuse part with densely spaced response at the beginning, and long delays further apart often make the reverberation more interesting. Using unity factors as channel gains *g*<sup>τ</sup>*<sup>i</sup>* lo, *<sup>g</sup>*<sup>τ</sup>*<sup>i</sup>* mi, *<sup>g</sup>*<sup>τ</sup>*<sup>i</sup>* hi = 1 and any orthogonal matrix *A*, the reverberation time becomes infinite. For smaller channel gains, the FDN produces decaying output.

Reverberation is characterized by the exponentially decaying envelope 10<sup>−</sup><sup>3</sup> *<sup>t</sup> <sup>T</sup>*<sup>60</sup> . For a single delay of the length <sup>τ</sup>*<sup>i</sup>* , the corresponding gain is *<sup>g</sup>*<sup>τ</sup>*<sup>i</sup>* with *<sup>g</sup>* <sup>=</sup> <sup>10</sup><sup>−</sup> <sup>3</sup> *<sup>T</sup>*<sup>60</sup> . This factor with the corresponding exponent provides equal reverberation decay rate in every channel, and hereby exact control of the reverberation time. To make the effect sound natural, it is typical to adjust the gains within a high-mid-low filter set to decrease the reverberation towards higher frequency bands by the gains *g*<sup>τ</sup>*<sup>i</sup>* hi ≤ *g*τ*i* mi <sup>≤</sup> *<sup>g</sup>*<sup>τ</sup>*<sup>i</sup>* lo.

The vector gathering the current sample for every feedback path is multiplied by the matrix *A*. For calculation in real-time, Rocchesso proposed to use a scaled Hadamard matrix *<sup>A</sup>* <sup>=</sup> <sup>1</sup> <sup>M</sup> *H* of the dimensions M = 2*<sup>k</sup>* in [37]. It consists of ±1 entries only and hereby perfectly mixes the signal across the different feedbacks to create a diffuse set of resonances. What is more, this not only replaces the M × M multiply and adds of matrix multiplication multiplies by sums and differences, it is

**Fig. 5.14** Feedback delay network (FDN) for Ambisonic reverb. The matrix *A* is unitary and the gain *<sup>g</sup>* <sup>=</sup> <sup>10</sup><sup>−</sup> <sup>3</sup> *<sup>T</sup>*<sup>60</sup> to the power of the delay τ*i* allows to adjust a spaitally and temporally diffuse reverberation effect in different bands (lo, mi, hi)

moreover equivalent to the efficient implementation as Fast Walsh-Hadamard Transform (FWHT), a butterfly algorithm. Figure 5.15 shows a graphical implementation example of a 16-channel FWHT in the real-time signal processing environment Pure Data.

**Fig. 5.15** The fast Walsh-Hadamard transform (FWHT) variant implemented in the 16-channel feedback delay network reverberator [rev3∼] in Pure Data requires only 4 × 16 sums/differences to replace the 16 × 16 multiplies of matrix multiplication by *A*

#### **5.7 Reverberation by Measured Room Impulse Responses and Spatial Decomposition Method in Ambisonics**

The first-order spatial impulse response of a room at the listener can be improved by resolution enhancements of the spatial decomposition method (SDM) by Tervo [38], which is a broad-band version of spatial impulse response rendering (SIRR) by Merimaa and Pulkki [39, 40]. For reliable measurements, typically loudspeakers are employed, and the typical measurement signals aren't impulses, but swept-sine signals that can are reverted to impulses by deconvolution. A room impulse response is typically sparse in its beginning whenever direct sound and early reflections arrive at the measurement location. Generally, it is likely that those arrival times in the early part do not coincide and are well separated from each other, so that one can assume their *temporal disjointness* at the receiver.

From a room impulse response *h*(*t*) that complies with this assumption, for which there consequently is a direction of arrival (DOA) *θ*DOA(*t*) for every time instant, one could construct an Ambisonic receiver-directional room impulse response as in [41] *<sup>h</sup>*(*θ*R, *<sup>t</sup>*) <sup>=</sup> *<sup>h</sup>*(*t*) δ[<sup>1</sup> <sup>−</sup> *<sup>θ</sup>*<sup>T</sup> <sup>R</sup>*θ*DOA(*t*)], depending on the direction *θ*<sup>R</sup> at the receiver. This response can be transformed into the spherical harmonic domain by integrating it over *y*N(*θ*R) d*θ*<sup>R</sup> <sup>S</sup><sup>2</sup> , to get the set of Nth-order Ambisonic room impulse responses

$$h\_{\mathcal{N}}(t) = h(t) \text{ y}\_{\mathcal{N}}[\theta\_{\text{DOA}}(t)].$$

A signal*s*(*t*) convolved by this vector of impulse responses theoretically generates a 3D Ambisonic image of the mono sound in the room of the measurement. This can be done, e.g., by the plug-in mcfx\_convolver. Now there are two problems to be solved: (i) how to estimate *θ*DOA(*t*), (ii) how to deal with the diffuse part of *h*(*t*), when there are more sound arrivals at a time than one.

*Estimation of the DOA*. One could now just detect the temporal peaks of the room impulse response and assign the guessed evolution of the direction of arrival as suggested in [42], and hereby span the envelopment of the room impulse response. Alternatively, if the room impulse response was recorded by a microphone array as in [38], array processing can be used to estimate the direction of arrival *θ*DOA(*t*). For first-order Ambisonic microphone arrays, when suitably band-limited to the frequency range in which the directional mapping is correct, e.g. between 200 Hz and 4 kHz, the vector *r*DOA of Eq. (A.83) in Appendix A.6.2 yields a suitable estimate

$$
\tilde{\mathbf{r}}\_{\text{DOA}}(t) = W(t) \begin{bmatrix} X(t) \\ Y(t) \\ Z(t) \end{bmatrix} = -\rho c \, h(t) \, \mathbf{v}(t), \quad \theta\_{\text{DOA}}(t) = \frac{\tilde{r}\_{\text{DOA}}(t)}{\|\tilde{r}\_{\text{DOA}}(t)\|}. \tag{5.19}
$$

Figure 5.16 shows the directional analysis of the first 100 ms of a first-order directional impulse response taken from the openair lib1 This response was measured in St. Andrew's Church Lyddington, UK (2600 m<sup>3</sup> volume, 11.5 m source-receiver distance) with a Soundfield SPS422B microphone.

<sup>1</sup>http://www.openairlib.net.

The direct sound from the front is clearly visible, as well as strong early reflections from front and back, and equally distributed weak directions from the diffuse reverb.

*Spectral decay recovery for higher-order RIRs*. The second task mentioned above is that the multiplication of *h*(*t*) by *y*N[*θ*DOA(*t*)] to obtain *h***˜**(*t*) degrades the spectral decay at higher orders. If there is no further processing, the resulting response typically exhibits a noticeable increased spectral brightness [38, 41, 43]. This unnatural brightness mainly affects the diffuse reverberation tail, where *temporal disjointness* is a poor assumption. There, the corresponding rapid changes of *θ*DOA(*t*) cause a strong amplitude modulation in the pre-processing of the late room impulse response at high Ambisonic orders. Typically, long decays of low frequencies leak into high frequencies, and hereby result in an erroneous spectral brightening of the diffuse tail. Figure 5.17 analyses the behavior in terms of an erroneous increase of reverberation time at high frequencies, especially when using high orders.

In order to equalize the spectral decay and hereby the reverberation time of the SDM-enhanced impulse response, there is a helpful pseudo-allpass property of the spherical harmonics for direct and diffuse fields, as described in Eqs. (A.52) and (A.55) of Appendix A.3.7. The signals in the vector *h***˜**(*t*) = [*h*˜*<sup>m</sup> <sup>n</sup>* (*t*)]*nm* are first decomposed into frequency bands, yielding the sub-band responses *h*˜*<sup>m</sup> <sup>n</sup>* (*t*, *b*). We can equalize the spectral sub-band decay for every band *b* and order *n* by targeting fulfillment of the pseudo-allpass property

$$\sum\_{m=-n}^{n} \mathcal{E}\{|h\_n^m(t,b)|^2\} = (2n+1)\mathcal{E}\{|h\_0^0(t,b)|^2\}.\tag{5.20}$$

The formulation above relies on the correct spectral decay of the omnidirectional signal *h*<sup>0</sup> <sup>0</sup>(*t*, *b*) = *h*˜<sup>0</sup> <sup>0</sup>(*t*, *b*), which is unaffected by modulation. Correction is achieved by

$$h\_n^m(t,b) = \tilde{h}\_n^m(t,b) \left\{ \frac{(2n+1)\mathcal{E}\{ |\tilde{h}\_0^0(t,b)|^2 \}}{\sum\_{m=-n}^n \mathcal{E}\{ |\tilde{h}\_n^m(t,b)|^2 \}};\tag{5.21}$$

here, the expression *<sup>E</sup>*{| · |2} refers to estimation of the squared signal envelope.

*Perceptual evaluation*. Frank's 2016 experiments [44] measuring the area of the sweet spot also investigated the plausibility of reverberation created by their Ambisonically SDM-processed measurements at different order settings, N = 1, 3, 5. For Fig. 5.18b listeners indicated at which distance from the room's center they heard that envelopment began to collapse to the nearest loudspeakers. One can observe that rendering diffuse reverberation for a large audience benefits from a high Ambisonic order. Moreover, experiments in [43] revealed an improvement of the perceived spatial depth mapping, i.e. a clearer separation between foreground and background sound for the SDM-processed higher-order reverberation, cf. Fig. 1.21b.

**Fig. 5.18** The perceptual sweet spot size as investigated by Frank [44] for SDM processed RIRs cover an area in IEM CUBE that increases with the SDM order N chosen (black = 5th, gray = 3rd, light gray = 1st order Ambisonics). In comparison to panned direct sound, one should keep some distance to the loudspeakers to avoid breakdown of envelopment

#### **5.8 Resolution Enhancement: DirAC, HARPEX, COMPASS**

The concept of parametric audio processing [5] describes ways to obtain resolutionenhanced first-order Ambisonic recordings by parametric decomposition and rendering. One main idea is to decompose short-term stationary signals of a sound scene into a *directional* and a less directional *diffuse* stream.

For synthesis of the *directional* part based on mono signals, it is clear how to obtain the most narrow presentations by amplitude panning or higher-order Ambisonic panning of consistent *r*<sup>E</sup> vector predictions as in Chap. 2.

The synthesis of *diffuse and enveloping* parts based on a mono signal can require extra processing such as either widening/diffuseness effects or reverberation as in Sects. 5.5 and 5.6, which both also provide a directionally wide distribution of sound. Or more practically, the recording itself could deliver sufficiently many uncorrelated instances of the diffuse sound to be played back by surrounding virtual sources. Envelopment and diffuseness is based on providing a consistently *low interaural covariance or cross correlation* of sufficiently high decorrelation.

*DirAC*. A main goal of DirAC (Directional Audio Coding [5]) is finding signals and parameters for sound rendering by analyzing first-order Ambisonic recordings. One variant is to use the intensity-vector-based analysis in the short-term Fourier transform (STFT), see also Appendix A.6.2:

$$\mathcal{T}\_{\text{DOA}}(t,\omega) = -\frac{\rho c \,\Re\{p(t,\omega)^\* \mathbf{v}(t,\omega)\}}{|p(t,\omega)|^2} = \frac{\Re\{W(t,\omega)^\* [X(t,\omega), Y(t,\omega), Z(t,\omega)]^\top\}}{\sqrt{2}|W(t,\omega)|^2},\tag{5.22}$$

which can be treated similarly as the *r*<sup>E</sup> vector, regarding direction and diffuseness ψ = 1 − *r*DOA2.

*Single-channel DirAC* is Ville Pulkki's original way to decompose the *W*(*t*, ω)signal in the STFT domain into a directional signal <sup>√</sup><sup>1</sup> <sup>−</sup> <sup>ψ</sup> *<sup>W</sup>*(*t*, ω) that is synthesized by amplitude panning and a diffuse signal <sup>√</sup><sup>ψ</sup> *<sup>W</sup>*(*t*, ω) to be synthesized diffusely [45]. *Virtual-microphone DirAC* uses a first-order Ambisonic decoder to the given loudspeaker layout and time-frequency-adaptive sharpening masks increasing the focus of direct sounds, see Vilkamo [46] and [5, Ch. 6], or order e.g. Sect. 5.2.3. Playback of diffuse sounds benefits from an optional diffuseness effect.

*HARPEX* (high angular-resolution plane-wave expansion [47]) is Svein Berge's patented solution to optimally decode sub-band signals. It is based on the observation he made with Natasha Barrett that decoding to a tetrahedral loudspeaker layout is perceptually outperforming if the tetrahedron nodes are rotationally aligned with the sources of the recording. HARPEX accomplishes convincing diffuse and direct sound reproduction by decoding to a variably adapted virtual loudspeaker layout in every sub band. The layout is adaptively rotation-aligned with sources detected in the band. HARPEX is typically described using an estimator for direction pairs.

*COMPASS* (COding and Multidirectional Parameterization of Ambisonic Sound Scenes [48]) by Archontis Politis can be seen as an extension of DirAC. In contrast to DirAC, it tries to detect and separate multiple direct sound sources from the ambient or background sound. This is done by applying two different kinds of beamformers: one that contains only the direct sound for each sound source (source signals) and one that contains everything but the direct sound (ambient signal). Similar as before, the source signals are reproduced using amplitude panning and the ambient signal is sent to the decorrelator. In contrast to DirAC, COMPASS is not limited to first-order input but can also enhance the spatial resolution of higher-order inputs.

#### **5.9 Practical Free-Software Examples**

#### *5.9.1 IEM, ambix, and mcfx Plug-In Suites*

The ambix\_converter is an important tool when adapting between the different Ambisonic scaling conventions, e.g. the standard SN3D normalization that uses only (*n*−|*m*|)! (*n*+|*m*|)! 2−δ*<sup>m</sup>* <sup>4</sup><sup>π</sup> for normalization instead of the full (*n*−|*m*|)! (*n*+|*m*|)! (2−δ*<sup>m</sup>* )(2*n*+1) <sup>4</sup><sup>π</sup> that is called N3D, see Fig. 5.19. This alternating definition is because of a practical choice of the ambix format [49] to avoid high-order channels becoming louder than the zerothorder channel. Also it permits to adapt between channel sequences such as ACN's *i* = *n*<sup>2</sup> + *n* + *m* or SID's *i* = *n*<sup>2</sup> + 2(*n* − |*m*|) + (*m* < 0). It is advisable to use test recordings with the main directions, e.g. front, left, top, and to check that the channel separation for decoded material is roughly exceeding 20 dB for 5th-order material. Moreover, it contains inversion of the Condon-Shortley phase that typically causes a 180◦ rotation around the *z* axis, and it contains the left-right, front-back, and topbottom flips discussed in the mirroring operations above.

The ambix\_warping plugin, see Fig. 5.20, implements the above-mentioned warping operations shifting horizontal sounds towards one of the poles, or into both polar directions. Warping can be applied to any other direction than zenith and nadir when placing it between two mutually inverting ambix\_rotation or IEM SceneRotator objects that intermediately rotate zenith to another direction.

The IEM SceneRotator as the ambix\_rotation plugin can be controlled by head tracking and it essential for an immersive headphone-based experience, see Fig. 5.21. Its processing is done as described above.

The ambix\_directional\_loudness plugin in Fig. 5.22 implements the abovementioned directional amplitude window in either circular or equi-rectangular spherical shape. Several of these windows can be made, soloed, and remote controlled, each one of which allowing to set a gain for the inside and outside region. This is often useful in practice, when, e.g., reinforcing or attenuating desired or undesired signal parts within an Ambisonic scene.


**Fig. 5.20** ambix\_warping plug-in in Reaper

**Fig. 5.21** IEM SceneRotator and ambix\_rotator plug-ins

**Fig. 5.22** ambix\_directional\_loudness plug-in

**Fig. 5.23** EnergyVisualizer plug-in

To observe the changes made to the Ambisonic scene, the IEM Energy Visualizer can be helpful, see Fig. 5.23.

If, for instance, the Ambisonic scene requires dynamic compression, as outlined in the section above, the IEM OmniCompressor is a helpful tool. It uses the omnidirectional Ambisonic channel to derive the compression gains (as a side-chain for all other Ambisonic channels). Similarly as the directional\_loudness plug-in, the IEM DirectionalCompressor allows to select a window, but this time for setting different dynamic compression within and outside the selected window, see Fig. 5.24.

The multichannel mcfx\_filter plugin in Fig. 5.25 does not only implement a set of parametric equalizers, a low- and high cut that can be toggled between filter skirts

**Fig. 5.24** OmniCompressor and DirectionalCompressor plug-in

**Fig. 5.25** mcfx\_filter plug-in

of either 2nd or 4th order, but it also features a real-time spectrum analyzer to observe the changes done to the signal. It is not only practical for Ambisonic purposes, it's just a set of parametric filters that is equally applied to all channels and controlled from one interface.

The mcfx\_convolver plug-in in Fig. 5.26 is useful for many purposes, also scientific ones, e.g., when testing binaural filters or driving multi-channel arrays with filters, etc. Its configuration files use the jconvolver format that specifies which filter file (typically stored in multi-channel wav files) connects which of its multiple inlets to which of its multiple outlets. It is also used to implement the SDM-based reverberation described in the above sections.

For a cheaper reverberation network, the IEM FDNReverb network described above can be used, see Fig. 5.27. It is not in particular an Ambisonic tool, but can

**Fig. 5.26** mcfx\_convolver plug-in

**Fig. 5.27** FDNReverb plug-in

be used in any multi-channel environment. The particularity of the implementation in the IEM suite is that a slow onset can be adjusted.

The ambix\_widening plug-in in Fig. 5.28 implements the widening by frequencydependent, dispersive rotation of the Ambisonic scene around the *z* axis as described above. It can also be used to cheaply stylize lateral reflections instead of the IEM RoomEncoder (Fig. 4.36) with time constant settings exceeding 5 ms, or just as a


**Fig. 5.28** ambix\_widening plug-in in Reaper


**Fig. 5.29** mcfx\_gain\_delay plug-in

widening effect. The setting *single-sided* permits to suppress the slow attack of the Bessel sequence.

Another tool is quite helpful, the mcfx\_gain\_delay plug-in in Fig. 5.29. It permits to to solo or mute individual channels, as well as delay and attenuate them differently. What is more and often even more useful: It is invaluably helpful for testing the signal chain, as one can step through the channels with different signals.

#### *5.9.2 Aalto SPARTA*

The SPARTA plug-in suite by Aalto University provides Ambisonic tools for encoding, decoding on loudspeakers and headphones, as well as visualization. A special feature is the COMPASS decoder plug-in Fig. 5.30 that can increase the spatial resolution of first-, second-, and third-order recordings. Playback can be done either

**Fig. 5.30** COMPASS Decoder plug-in

on arbitrary loudspeaker arrangements or their virtualization on headphones. The signal-dependent parametric processing allows to adjust the balance between direct and diffuse sound in each frequency band. In order to suppress artifacts due to the processing, the parametric playback (Par) can be mixed with the static decoding (Lin) of the original recording. While it is advisable to keep the parametric contribution below <sup>2</sup>/<sup>3</sup> for noticable directional improvements and low artifacts, in general, in recordings with cymbals or hihats it is advisable to fade towards lin starting at around 4 kHz.

#### *5.9.3 Røde*

The Soundfield plug-in by Røde in Fig. 5.31 was originally designed to process the signals from the four cardioid microphone capsules of their Soundfield microphone. However, it also supports first-order Ambisonics as input format. It can decode to various loudspeaker arrangements by placing virtual microphones into the directions of the loudspeakers. The directivity of each virtual microphone can be adjusted between first-order cardioid and hyper-cardioid. Moreover, higher-order directivity patterns are possible using a parametric signal-dependent processing, resulting in an increase of the spatial resolution.

**Fig. 5.31** Soundfield by Røde plug-in

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)**

*…a turning point has been the design of HOA microphones, opening an exciting experimental field in terms of real 3D sound field recording …*

Jérôme Daniel [1] at Ambisonics Symposium 2009.

**Abstract** Unlike pressure-gradient transducers, single-transducer microphones with higher-order directivity apparently turned out to be difficult to manufacture at reasonable audio quality. Therefore nowadays, higher-order Ambisonic recording with compact devices is based on compact spherical arrays of pressure transducers. To prepare for higher-order Ambisonic recording based on arrays, we first need a model of the sound pressure that the individual transducers of such an array would receive in an arbitrary surrounding sound field. The lossless, linear wave equation is the most suitable model to describe how sound propagates when the sound field is composed of surrounding sound sources. Fundamentally, the wave equation models sound propagation by how small packages of air react (i) when being expanded or compressed by a change of the internal pressure, and to (ii) directional differences in the outside pressure by starting to move. Based there upon, the inhomogeneous solutions of the wave equation describe how an entire free sound field builds up if being excited by an omnidirectional sound source, as a simplified model of an arbitrary physical source, such as a loudspeaker, human talker, or musical instrument. After adressing these basics, the chapter shows a way to get Ambisonic signals of high spatial and timbral quality from the array signals, considering the necessary diffusefield equalization, side-lobe suppression, and trade off between spatial resolution and low-frequeny noise boost. The chapter concludes with application examples.

Gary Elko and Jens Meyer are the well-known inventors of the first commercially available compact spherical microphone array that is able to record higher-order Ambisonics [2], the Eigenmike. There are several inspiring scientific works with valuable contributions that can be recommended for further reading [3–12], above all Boaz Rafaely's excellent introductory book [13].

This mathematical theory might appear extensive, but it cannot be avoided when aiming at an in-depth understanding of higher-order Ambisonic microphones. The theory enables processing of the microphone signals received such that the surrounding sound field excitation is retrieved in terms of an Ambisonic signal. Some readers may want to skip the physical introduction and resume in Sect. 6.5 on spherical scattering or Sect. 6.6 on the processing of the array signals.

#### **6.1 Equation of Compression**

Wave propagation involves *reversible* short-term temperature fluctuations becoming effective when air is being compressed by sound, causing the specific stiffness of air in sound propagation. The Appendix A.6.1 shows how to derive this adiabatic compression relation based on the first law of thermodynamics and the ideal gas law. It relates the relative volume change *<sup>V</sup> <sup>V</sup>*<sup>0</sup> to the pressure change *<sup>p</sup>* = −*<sup>K</sup> <sup>V</sup> V*0 by the bulk modulus of air. After expressing the bulk modulus by more common constants<sup>1</sup> *K* = ρ *c*<sup>2</sup> and differentially formulating the volume change over time using the change of the sound particle velocity in space, e.g. in one dimension *p*˙ = <sup>−</sup><sup>ρ</sup> *<sup>c</sup>*<sup>2</sup> <sup>∂</sup>*vx* <sup>∂</sup>*<sup>x</sup>* , cf. Appendix A.6.1, we get the three-dimensional compression equation:

$$\frac{\partial p}{\partial t} = -\rho \, c^2 \, \nabla^\mathrm{T} \mathbf{v}.\tag{6.1}$$

Here the inner product of the Del symbol **<sup>∇</sup>**<sup>T</sup> <sup>=</sup> ( <sup>∂</sup> <sup>∂</sup>*<sup>x</sup>* , <sup>∂</sup> <sup>∂</sup>*<sup>y</sup>* , <sup>∂</sup> <sup>∂</sup>*<sup>z</sup>* ) with *v* yields what is called divergence div(*v*) <sup>=</sup> **<sup>∇</sup>**<sup>T</sup>*<sup>v</sup>* <sup>=</sup> <sup>∂</sup>*v*<sup>x</sup> <sup>∂</sup>*<sup>x</sup>* <sup>+</sup> <sup>∂</sup>*v*<sup>y</sup> <sup>∂</sup>*<sup>y</sup>* <sup>+</sup> <sup>∂</sup>*v*<sup>z</sup> <sup>∂</sup>*<sup>z</sup>* . The equation means: *Independently of whether the outer boundaries of a small package of air are traveling at a common velocity: If there are directions into which their velocity is spatially increasing, the resulting gradual volume expansion over time causes a proportional decrease of interior pressure over time.*

#### **6.2 Equation of Motion**

The equation of motion is relatively simple to understand from the Newtonian equation of motion, e.g. for the *<sup>x</sup>* direction, *<sup>F</sup>*<sup>x</sup> <sup>=</sup> *<sup>m</sup>* <sup>∂</sup>*v*<sup>x</sup> <sup>∂</sup>*<sup>t</sup>* equates the external force to mass *m* times acceleration, i.e. increase in velocity <sup>∂</sup>*<sup>v</sup>* <sup>∂</sup>*<sup>t</sup>* . For a small package of air with constant volume *V*<sup>0</sup> = *xyz*, the mass is obtained by the air density *m* = ρ *V*0, and the force equals the decrease of in pressure over the three space directions, times the corresponding partial surface, e.g. for the *x* direction

<sup>1</sup>Typical constants are: density <sup>ρ</sup> <sup>=</sup> <sup>1</sup>.2 kg/m3, speed of sound *<sup>c</sup>* <sup>=</sup> 343 m/s.

*F*<sup>x</sup> = −[*p*(*x* + *x*) − *p*(*x*)]*yz*. For the *x* direction, this yields after expanding by *<sup>x</sup> x*

$$-\frac{\Delta p}{\Delta x}\,V\_0 = \rho\,V\_0\,\frac{\partial v\_\mathbf{x}}{\partial t}.$$

Dividing by −*V*<sup>0</sup> and letting *V*<sup>0</sup> → 0, we obtain the typical shape of the equation of motion for all three space directions

$$
\nabla \cdot p = -\rho \frac{\partial \mathbf{v}}{\partial t}.\tag{6.2}
$$

The equation of motion means: *Independently of the common exterior pressure load on all the outer boundaries of a small air package, an outer pressure decrease into any direction implies a corresponding pushing force on the package causing a proportional acceleration into this direction.*

#### **6.3 Wave Equation**

We can combine the compression equation <sup>∂</sup>*<sup>p</sup>* <sup>∂</sup>*<sup>t</sup>* = −<sup>ρ</sup> *<sup>c</sup>*<sup>2</sup> **<sup>∇</sup>**<sup>T</sup>*<sup>v</sup>* with the equation of motion **<sup>∇</sup>** *<sup>p</sup>* = −<sup>ρ</sup> <sup>∂</sup>*<sup>v</sup>* <sup>∂</sup>*<sup>t</sup>* by deriving the first one with regard to time <sup>∂</sup><sup>2</sup> *<sup>p</sup>* <sup>∂</sup>*<sup>t</sup>* <sup>2</sup> = −<sup>ρ</sup> *<sup>c</sup>*<sup>2</sup> **<sup>∇</sup>**<sup>T</sup> <sup>∂</sup>*<sup>v</sup>* ∂*t* and the second one with the gradient **<sup>∇</sup>**<sup>T</sup> yielding the Laplacian **<sup>∇</sup>**T**<sup>∇</sup>** = , hence *<sup>p</sup>* = −ρ**∇**<sup>T</sup> <sup>∂</sup>*<sup>v</sup>* <sup>∂</sup>*<sup>t</sup>* . Division of the first result by *c*<sup>2</sup> and equating both terms yields the lossless wave equation *<sup>p</sup>* <sup>=</sup> <sup>1</sup> *c*2 ∂2 <sup>∂</sup>*<sup>t</sup>* <sup>2</sup> *p* that is typically written as

$$
\left(\Delta - \frac{1}{c^2} \frac{\partial^2}{\partial t^2}\right) p = 0.\tag{6.3}
$$

Obviously, the wave equation relates the curvature in space (expressed by the Laplacian) to curvature in time (expressed by the second-order derivative).

If *p* is a pure sinusoidal oscillation sin(ω *t* + φ0), the second derivative in time corresponds to a factor <sup>−</sup>ω2, and by substitution with the wave-number *<sup>k</sup>* <sup>=</sup> <sup>ω</sup> *<sup>c</sup>* , we can write the frequency-domain wave equation as

$$\left(\bigtriangleup + k^{2}\right)p = 0,\qquad\qquad\text{Helmholtz equation.}\tag{6.4}$$

#### *6.3.1 Elementary Inhomogeneous Solution: Green's Function (Free Field)*

The Green's function is an elementary prototype for solutions to inhomogeneous problems ( + *k*<sup>2</sup>)*p* = −*q*, which is defined as

$$(\nabla + k^2)G = -\delta.$$

A general excitation *q* of the equation can be represented by its convolution with the Dirac delta distribution *q*(*s*) δ(*r* − *s*) d*V* (*s*) = *q*(*r*). Consequently, as the wave equation is linear, the general solution must therefore also equal the convolution of the Green's function with the excitation function *p*(*r*) = *q*(*s*) *G*(*r* − *s*) d*V* (*s*) over space; if formulated in the time domain: also over time. The integral superimposes acoustical responses of any point in time and space of the source phenomenon, weighted by the corresponding source strength in space and time.

The Green's function in three dimensions is derived in Appendix A.6.3, Eq. (A.91),

$$G = \frac{e^{-ik\cdot r}}{4\pi r},\tag{6.5}$$

with the wave number *<sup>k</sup>* <sup>=</sup> <sup>ω</sup> *<sup>c</sup>* and distance between source and receiver *r* = *r* − *r*s2.

Acoustic source phenomena are characterized by the behavior of the Green's function: far away, the amplitude decays with <sup>1</sup> *<sup>r</sup>* and the phase <sup>−</sup>*kr* = −ω*<sup>r</sup> <sup>c</sup>* corresponds to the radially increasing delay *<sup>r</sup> <sup>c</sup>* . Both is expressed in Sommerfeld's radiation condition lim*<sup>r</sup>*→∞ *r* ∂ <sup>∂</sup>*<sup>r</sup> p* + i*k p* = 0.

*Plane waves*. The radius coordinate of the Green's function is the distance between two Cartesian position vectors *r*<sup>s</sup> and *r*, the source and receiver location. Letting one of them become large is denoted by re-expressing it in terms of radius and direction vector *r*<sup>s</sup> = *r*s*θ*s. This permits far-field approximation

$$r\_s = \|r\_s - r\| = \sqrt{(r\_s \theta\_s - r)^\mathrm{T}(r\_s \theta\_s - r)} = \sqrt{r\_s^2 - 2r\_s \theta\_s^\mathrm{T} r + r^2} \tag{6.6}$$

$$\lim\_{r\_s \to \infty} r\_s = \lim\_{r\_s \to \infty} r\_s \sqrt{1 - 2\frac{\boldsymbol{\theta}\_s^\mathrm{T} \boldsymbol{r}}{r\_s} + \frac{r\_s^\mathrm{\boldsymbol{\gamma}}}{f\_s^\mathrm{\boldsymbol{\gamma}}}} = r\_s - \boldsymbol{\theta}\_s^\mathrm{\boldsymbol{\Gamma}} \boldsymbol{r}. \qquad (\text{with } \lim\_{\boldsymbol{x} \to \boldsymbol{0}} \sqrt{1 - 2\boldsymbol{x}} = 1 - \boldsymbol{x}).$$

For the *phase approximation*, for instance at a wave-length of 30 cm, we notice even for a relatively small distance difference, e.g. between 15 m and 15 m + 15 cm, we could change the sign of the wave. To approximate the phase of the Green's function, we must therefore at least use *<sup>r</sup>*<sup>s</sup> <sup>−</sup> *<sup>θ</sup>*<sup>T</sup> <sup>s</sup> *r* as approximation. By contrast, this level of precision is irrelevant for the *magnitude approximation*, e.g., it would be negligible if we used <sup>1</sup> 15 m instead of the magnitude <sup>1</sup> 15 m+15 cm .

At a large distance *r*<sup>s</sup> assumed to be constant, the Green's function is proportional to a *plane wave* from the source direction *θ*<sup>s</sup>

$$\lim\_{r\_\* \to \infty} G = \frac{e^{-ik \cdot r\_\*}}{4\pi} \, e^{ik \, \theta\_\*^{\mathrm{T}} r}. \tag{6.7}$$

The plane-wave part is of unit magnitude |*p*| = 1

$$p = e^{\mathrm{i}k\,\theta\_s^T r} \,\tag{6.8}$$

and its phase evaluates the projection of the position vector onto the plane-wave arrival direction *θ*s. Towards the direction *θ*s, the phase grows positive, i.e. the signal arrives earlier. Towards the plane-wave propagation direction −*θ*<sup>s</sup> the phase grows negatively, implying an increasing time delay, which is constant on any plane perpendicular to *θ*s.

Plane waves are an invaluable tool to locally approximate sound fields from sources that are sufficiently far away, within a small region.2

#### **6.4 Basis Solutions in Spherical Coordinates**

Figure 4.11 shows spherical coordinates [14, 15] using radius *r*, azimuth ϕ, and zenith <sup>ϑ</sup>. For simplification, zenith is replaced by <sup>ζ</sup> <sup>=</sup> cos <sup>ϑ</sup> <sup>=</sup> *<sup>z</sup> <sup>r</sup>* , here. We may solve the Helmholtz equation ( + *k*<sup>2</sup>)*p* = 0 in spherical coordinates by the radial and directional parts of the Laplacian =<sup>r</sup> + ϕ,<sup>ζ</sup>, as identified in Appendix A.3

$$
\Delta\_{\mathbf{f}} = \frac{\partial^2}{\partial r^2} + \frac{2}{r} \frac{\partial}{\partial r}, \quad \Delta\_{\boldsymbol{\Phi}, \boldsymbol{\xi}} = \frac{1 - \boldsymbol{\xi}^2}{r^2} \frac{\partial^2}{\partial \boldsymbol{\xi}^2} - \frac{2}{r^2} \boldsymbol{\xi} \frac{\partial}{\partial \boldsymbol{\xi}} + \frac{1}{r^2 (1 - \boldsymbol{\xi}^2)} \frac{\partial^2}{\partial \boldsymbol{\varphi}^2}. \tag{6.9}
$$

We already know the spherical harmonics as directional eigensolutions from Sect. 4.7

$$
\Delta\_{\varphi,\xi} Y\_n^m = -\frac{n(n+1)}{r^2} \,\, Y\_n^m \tag{6.10}
$$

and assume them to be a factor of the solution *p<sup>m</sup> <sup>n</sup>* = *R Y <sup>m</sup> <sup>n</sup>* determining the value of ϕ,<sup>ζ</sup> in (<sup>r</sup> + *k*<sup>2</sup> + ϕ,<sup>ζ</sup>)*p<sup>m</sup> <sup>n</sup>* = 0. We find a separated radial differential equation after insertion, multiplication by *<sup>r</sup>* <sup>2</sup> *Y <sup>m</sup> n* , and re-expressing the differentials <sup>∂</sup> <sup>∂</sup>*<sup>r</sup>* <sup>=</sup> *<sup>k</sup>* <sup>∂</sup> ∂*kr* and <sup>∂</sup><sup>2</sup> <sup>∂</sup>*<sup>r</sup>* <sup>2</sup> <sup>=</sup> *<sup>k</sup>*<sup>2</sup> <sup>∂</sup><sup>2</sup> ∂(*kr*)<sup>2</sup>

$$\left[ (kr)^2 \frac{\partial^2}{\partial (kr)^2} + 2(kr) \frac{\partial}{\partial (kr)} + (kr)^2 - n(n+1) \right] R = 0. \tag{6.11}$$

Appendix A.6.4 shows how to get physical solutions for *R* of this, so-called,*spherical Bessel differential equation*: spherical Hankel functions of the second kind *h*(2) *<sup>n</sup>* (*kr*) able to represent radiation (radially outgoing into every direction), consistently with Green's function *G*, diverging with an (*n* + 1)-fold pole at *kr* = 0, a physical behavior that would also be observed after spatially differentiating *G*, see Fig. 6.1; spherical Bessel functions *jn*(*kr*) = ℜ{*h*(2) *<sup>n</sup>* (*kr*)} are real-valued, converge everywhere, exhibit

<sup>2</sup>This is because, strictly speaking, an entire plane-wave sound field is unphysical and of infinite energy: either the exhaustive in-phase vibration of an infinite plane is required, or an infiniteamplitude point-source infinitely far away is required with infinite anticipation *t*<sup>s</sup> → +∞ (noncausal).

**Fig. 6.1** Spherical Bessel functions *jn*(*kr*) <sup>=</sup> <sup>ℜ</sup>{*h*(2) *<sup>n</sup>* (*kr*)} (top left), imaginary part of spherical Hankel functions <sup>ℑ</sup>{*h*(2) *<sup>n</sup>* (*kr*)} (top right), and magnitude/dB of <sup>|</sup>*h*(2) *<sup>n</sup>* (*kr*)<sup>|</sup> (bottom), over *kr*

an *n*-fold zero at *kr* = 0, and can't represent radiation. Implementations typically rely on the accurate standard libraries implementing cylindrical Bessel and Hankel functions:

$$j\_n(kr) = \sqrt{\frac{\pi}{2} \frac{1}{kr}} J\_{n+\frac{1}{2}}(kr), \qquad h\_n^{(2)}(kr) = \sqrt{\frac{\pi}{2} \frac{1}{kr}} H\_{n+\frac{1}{2}}^{(2)}(kr). \tag{6.12}$$

*Wave spectra and spherical basis solutions*. Any sound field evaluated at a radius *r* where the air is source-free and homogeneous in any direction can be represented by spherical basis functions for enclosed *jn*(*kr*)*Y <sup>m</sup> <sup>n</sup>* (*θ*) and radiating fields *hn*(*kr*)*Y <sup>m</sup> <sup>n</sup>* (*θ*)

$$p = \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} \left[ b\_{nm} j\_n(kr) + c\_{nm} h\_n(kr) \right] Y\_n^m(\theta) \,. \tag{6.13}$$

Here, *bnm* are the coefficients for *incoming waves* that pass through and emanate from radii larger than *r* and *cnm* are the coefficients of *outgoing waves* radiating from sources at radii smaller than *r*; the coefficients are called *wave spectra* of the incoming and outgoing waves, cf. [16].

*Ambisonic plane-wave spectrum, plane wave*. Plane waves only use the coefficients *bnm*, while *cnm* = 0 in Eq. (6.13). The sum of incoming plane waves from all directions, whose amplitudes are given by the spherical harmonics coefficients χ*nm* as a set of Ambisonic signals are described by the *incoming* wave spectrum, see Appendix A.6.5, Eq. (A.119)

$$b\_{nm} = 4\pi \text{ i}^n \ \chi\_{nm}.\tag{6.14}$$

Figure 6.2 shows a single plane wave incoming from the direction *θ*<sup>s</sup> represented by

$$b\_{nm} = 4\pi \text{ i}^n \ Y\_n^m(\theta\_s) \tag{6.15}$$

**Fig. 6.2** Plane wave from *<sup>y</sup>* axis <sup>ϕ</sup> <sup>=</sup> <sup>ϑ</sup> <sup>=</sup> <sup>π</sup> <sup>2</sup> in horizontal cross section; time steps correspond to 0◦, 60◦, 120◦, and 180◦ phase shifts <sup>φ</sup> in the plot <sup>ℜ</sup>{*p e*iφ} showing *<sup>p</sup>* from Eq. (6.13) with *cnm* = 0 and *bnm* of Eq. (6.15) with *bnm* = 4πi *nY <sup>m</sup> <sup>n</sup>* ( <sup>π</sup> <sup>2</sup> , <sup>π</sup> <sup>2</sup> ); long wave (top), short wave (bottom); simulation uses N = 25 and area shows |*kx*|, |*ky*| < 2π and 8π

at four different time steps corresponding to 0◦, 60◦, 120◦ and 180◦ time shifts for the two wave lengths shown.

#### **6.5 Scattering by Rigid Higher-Order Microphone Surface**

Higher-order Ambisonic microphone arrays are typically mounted on a rigid sphere of some radius*r* = a, such as the Eigenmike EM32, see Fig. 6.3. The physical boundary of the rigid spherical surface is expressed as a vanishing radial component of the sound particle velocity. The radial sound particle velocity is obtained via the

**Fig. 6.3** 32-channel higher-order Ambisonic mic. Eigenmike EM32

**Fig. 6.4** Plane waves scattered by rigid sphere *k*a = π (top) or *k*a = 4π (bottom); time steps correspond to 0◦, 60◦, 120◦, and 180◦ phase shifts <sup>φ</sup> in the plot <sup>ℜ</sup>{*p e*iφ} showing *<sup>p</sup>* from Eq. (6.13) with *bnm* and *cnm* from Eq. (6.15) with *bnm* = 4πi *nY <sup>m</sup> <sup>n</sup>* ( <sup>π</sup> <sup>2</sup> , <sup>π</sup> <sup>2</sup> ) and Eq. (6.16); simulation uses N = 25

equation of motion Eq. (6.2) by deriving Eq. (6.13). This requires to evaluate differentiated spherical radial solutions *j <sup>n</sup>*(*x*) as well as *h*(2) *<sup>n</sup>* (*x*), which is implemented by *f <sup>n</sup>*(*x*) <sup>=</sup> *<sup>n</sup> <sup>x</sup> fn*(*x*) − *fn*+<sup>1</sup>(*x*) for either of the functions, cf. e.g. [16]. A sound-hard boundary condition at the radius a requires

$$\left. \psi\_{\mathbf{t}} \right|\_{r=\mathbf{a}} = \frac{\mathbf{i}}{\rho \ c} \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} \left[ b\_{nm} j\_n'(kr) + c\_{nm} h\_n'^{(2)}(kr) \right]\_{r=\mathbf{a}} Y\_n^m(\theta) = 0,$$

which is fulfilled by a vanishing term in square brackets. The rigid boundary responds to incoming surround-sound by velocity-canceling outgoing waves *h*(2) *<sup>n</sup>* (*k*a) *cnm* = − *j <sup>n</sup>*(*k*a) *bnm*. The coefficients ψ*nm* yield the sound pressure in Fig. 6.4,

$$p = \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} \psi\_{nm} Y\_n^m(\theta), \quad \text{with } \psi\_{nm} = \left[ j\_n(kr) - h\_n^{(2)}(kr) \frac{j\_n'(ka)}{h\_n'^{(2)}(ka)} \right]\_{r=a} b\_{nm}. \tag{6.16}$$

The two terms of the bracket are typically further simplified by a common denominator and recognizing the Wronskian Eq. (A.97) in the numerator *jn* (*x*)*h <sup>n</sup>* (*x*)− *j <sup>n</sup>* (*x*)*hn* (*x*) *h <sup>n</sup>* (*x*) <sup>=</sup> <sup>i</sup> *x*2*h <sup>n</sup>* (*x*)

$$|\psi\_{nm}|\_{r=\mathbf{a}} = \frac{\mathbf{i}}{(k\mathbf{a})^2 \, h\_n'^{(2)}(k\mathbf{a})} \, b\_{nm}. \tag{6.17}$$

**Fig. 6.5** Attenuation/dB of Ambisonic signals of different orders for varying values of *k*a

*Relation of recorded sound pressure to Ambisonic signal*. The scattering equation relates the recorded sound pressure expanded in spherical harmonics to the Ambisonic signal of surround sound scene, see frequency responses in Fig. 6.5,

$$
\psi\_{nm}|\_{r=4} = \frac{4\pi \text{ i}^{n+1}}{(ka)^2 \, h\_n'^{(2)}(ka)} \, \chi\_{nm}. \tag{6.18}
$$

It is formally convenient that as soon as the sound pressure is given in terms of its spherical harmonic coefficient signals ψ*nm*, the Ambisonic signals χ*nm* of a concentric playback system are obviously just an inversely filtered version thereof, with no need for further unmixing/matrixing.

Recognizable from Fig. 6.6 and following our intuition, waves of lengths larger than the diameter 2a of the sphere will only weakly map to complicated high-order patterns. It is therefore easily understood that the transfer function i*<sup>n</sup>*+1[(*k*a)<sup>2</sup> *h*(2) *<sup>n</sup>* (*k*a)] <sup>−</sup><sup>1</sup> attenuates the reception of high-order Ambisonic signals at low frequencies, see Fig. 6.5.

#### **6.6 Higher-Order Microphone Array Encoding**

The block diagram of Ambisonic encoding of higher-order microphone array signals is shown in Fig. 6.7. The first processing step is about decomposing the pressure samples *p*(*t*) from the microphone array into its spherical harmonics coefficients *ψ*N(*t*): To which amount do the samples contain omnidirectional, figure-of-eight, and other spherical harmonic patterns, up to which the microphone arrangement allows decomposition. The frequency-independent matrix (*Y*<sup>T</sup> <sup>N</sup>)† does the conversion. It is the left-inverse to the spherical harmonics sampled at the microphone positions, as shown in the upcoming section.

The second step then sharpens the sound pressure image to an Ambisonic signal by filtering the spherical harmonic coefficient signals. The basic relation between sound pressure coefficients and Ambisonic signals is given in Eq. (6.18) and describes a filter for every coefficient signal, differing only in filter characteristics for different spherical harmonic orders. Robustness to noise, microphone matching and position-

(a) <sup>1</sup> <sup>32</sup> -wave-length diameter,i.e. *<sup>k</sup>* a = <sup>π</sup>

(c) <sup>1</sup> <sup>16</sup> -wave-length diameter,i.e. *<sup>k</sup>* a = <sup>π</sup>

<sup>16</sup> (d) 1-wave-length diameter,i.e. *k* a = π

(e) <sup>1</sup> <sup>8</sup> -wave-length diameter,i.e. *<sup>k</sup>* a = <sup>π</sup>

<sup>8</sup> (f) 2-wave-length diameter,i.e. *k* a=2π

(g) <sup>1</sup> <sup>4</sup> -wave-lengths diameter,i.e. *<sup>k</sup>* a = <sup>π</sup>

<sup>4</sup> (h) 4-wave-lengths diameter,i.e. *k* a=4π

**Fig. 6.6** Plane-wave sound pressure image <sup>ℜ</sup>{*p e*−i*k*a} on rigid sphere with varying *<sup>k</sup>*a using <sup>ψ</sup>*nm* from Eq. (6.17) expanded over the spherical harmonics *<sup>p</sup>* <sup>=</sup> ψ*nmY <sup>m</sup> <sup>n</sup>* and <sup>χ</sup>*nm* <sup>=</sup> *<sup>Y</sup> <sup>m</sup> <sup>n</sup>* (0, 0) for a plane wave from *<sup>z</sup>*. With the wave length <sup>λ</sup> <sup>=</sup> *<sup>c</sup> <sup>f</sup>* , the value *k*a is related to a diameter 2a of *k*a <sup>π</sup> <sup>=</sup> <sup>2</sup><sup>π</sup> *<sup>f</sup>* <sup>a</sup> <sup>π</sup> *<sup>c</sup>* <sup>=</sup> 2a <sup>λ</sup> in wave lengths to express frequency dependency; simulation uses N = 50; for *a* = 4.2 cm, *ka* values correspond to *f* = 125, 250, 500, 1000, 2000, 4000, 8000, 16000 Hz

ing is the key here, and only achieved by the careful design of these filters, as shown in a further sections below. The design considers a gradually increasing sharpening over frequency, for which it moreover employs a filter bank with separate, max-*r*<sup>E</sup> weighted and *E* normalized bands, in order to provide (i) limitation of noise and errors, (ii) a frequency response perceived as flat, and (iii) optimal suppression of the sidelobes.

#### **6.7 Discrete Sound Pressure Samples in Spherical Harmonics**

To determine the Ambisonics signals χ*nm*, we obviously need to find ψ*nm* based on all sound pressure samples *p*(**θ***i*) recorded by the microphones distributed on the rigid-sphere array. To accomplish this, we set up a system of model equations equating the pressure samples to the unknown coefficients ψ*nm* expanded over the spherical harmonics *Y <sup>m</sup> <sup>n</sup>* (**θ***i*) sampled at every microphone position. A vector and matrix notation *<sup>p</sup>* = [*p*(**θ***i*)]*<sup>i</sup>* and *<sup>Y</sup>*<sup>T</sup> <sup>N</sup> = [ *y*(**θ***i*)<sup>T</sup>]*<sup>i</sup>*,*nm* is helpful

$$
\begin{bmatrix} p(\boldsymbol{\theta}\_{1}) \\ \vdots \\ p(\boldsymbol{\theta}\_{\mathsf{M}}) \end{bmatrix} = \begin{bmatrix} Y\_{0}^{0}(\boldsymbol{\theta}\_{1}) \ \dots \ \boldsymbol{Y}\_{\mathsf{N}}^{\mathsf{N}}(\boldsymbol{\theta}\_{1}) \\ \vdots \\ \boldsymbol{Y}\_{0}^{0}(\boldsymbol{\theta}\_{\mathsf{M}}) \ \dots \ \boldsymbol{Y}\_{\mathsf{N}}^{\mathsf{N}}(\boldsymbol{\theta}\_{\mathsf{M}}) \end{bmatrix} \begin{bmatrix} \boldsymbol{\psi}\_{00} \\ \vdots \\ \boldsymbol{\psi}\_{\mathsf{NN}} \end{bmatrix} \\ \boldsymbol{\eta}\_{\mathsf{NN}} = \boldsymbol{Y}\_{\mathsf{N}}^{\mathsf{T}} \boldsymbol{\Psi}\_{\mathsf{N}}. \tag{6.19}
$$

*Left inverse (MMSE)*. The equation can be (pseudo-)inverted if the matrix *Y* <sup>N</sup> is well conditioned. Typically more microphones are used than coefficients searched M ≥ (N + 1)2. Inversion is a matter of mean-square error minimization: As the M dimensions may contain more degrees of freedom than (N + 1)2, the coefficient vector *ψ*<sup>N</sup> giving the closest model *p*<sup>N</sup> to the measurement *p* is searched,

$$\min\_{\boldsymbol{\Phi}\_{\text{N}}} \|\boldsymbol{\varrho}\|^2,\qquad\qquad\text{with }\boldsymbol{\varrho} = \boldsymbol{p}\_{\text{N}} - \boldsymbol{p} = \boldsymbol{Y}\_{\text{N}}^{\text{T}} \boldsymbol{\psi}\_{\text{N}} - \boldsymbol{p}.\tag{6.20}$$

The minimum-mean-square-error (MMSE) solution is, see Appendix A.4, Eq. (A.65),

$$\boldsymbol{\psi}\_{\rm{N}} = (\boldsymbol{Y}\_{\rm{N}} \boldsymbol{Y}\_{\rm{N}}^{\rm{T}})^{-1} \boldsymbol{Y}\_{\rm{N}} \ p = (\boldsymbol{Y}\_{\rm{N}}^{\rm{T}})^{\dagger} \ p. \tag{6.21}$$

The resulting left inverse (*Y* <sup>N</sup>*Y*<sup>T</sup> <sup>N</sup>)−1*Y* <sup>N</sup> inverts the thin matrix *Y*<sup>T</sup> <sup>N</sup> from the left. (*Y*<sup>T</sup> <sup>N</sup>)† symbolizes the pseudo inverse; it is left-inverse for thin matrices.

If the microphones are arranged in a *t*-design and the order N is chosen suitably, then the transpose matrix times <sup>4</sup><sup>π</sup> <sup>L</sup> is equivalent to the left inverse. A more thorough discussion on spherical point sets can be found in [17–19].

The *maximum determinant points* [20] are a particular kind of *critical* directional sampling scheme that allows to use exactly as few microphones M = (N + 1)<sup>2</sup> as spherical harmonic coefficients obtained, yielding a well-conditioned square matrix *Y* N, so that it can be inverted directly without left/pseudo-inversion. The 25 maximum-determinant points for N = 4 are used in the simulation example below.3

*Finite-order assumption and spatial aliasing*. An important implication of estimating ψ*nm* is that we need to assume that the distribution of the sound pressure is of limited spherical harmonic order on the measurement surface. This could be done by restricting the frequency range, as high-order harmonics are attenuated wellenough according above suitable frequency limits, cf. Fig. 6.5. However, low-pass filtered signals are unacceptable in practice. Instead, one has to accept *spatial aliasing* at high frequencies, i.e. directional mapping errors and direction-specific comb filters. Figure 6.8 shows spatial aliasing of *<sup>ψ</sup>*<sup>N</sup> <sup>=</sup> (*Y*<sup>T</sup> <sup>N</sup>)−<sup>1</sup> *p* in the angular domain *p* = ψ*nmY <sup>m</sup> n* .

#### **6.8 Regularizing Filter Bank for Radial Filters**

The filters i*<sup>n</sup>* (*ka*)<sup>2</sup> *h*(2) *<sup>n</sup>* (*ka*) <sup>−</sup><sup>1</sup> of Fig. 6.5 exhibit an *<sup>n</sup>*th-order zero at 0 Hz, *<sup>k</sup>*<sup>a</sup> <sup>=</sup> 0. To retrieve the Ambisonic signals χ*nm* from the sound pressure signals ψ*nm*, their inverse would have a *n*-fold (unstable) pole at 0 Hz. Considering that microphone self noise and array imperfection cause erroneous signals louder than the acoustically expected *n*th-order vanishing signals around 0 Hz, filter shapes will moreover cause an excessive boost of erroneous signals unless implemented with precaution. Filters of the different orders *n* must be stabilized by high-pass slopes of at least the order *n*, see also [6, 9, 21–25], and with (*n* + 1)th-order high-pass slopes, see Fig. 6.9, such errors are being cut off by first-order high-pass slopes at exemplary cut-on frequencies at 90, 680, 1650, 2600 Hz for the Ambisonic orders 1, 2, 3, 4, yielding a

<sup>3</sup>md04.0025 on https://web.maths.unsw.edu.au/~rsw/Sphere/Images/MD/md\_data.html.

(a) <sup>1</sup> <sup>32</sup> -wave-length diameter, i.e. *<sup>k</sup>* a = <sup>π</sup>

(c) <sup>1</sup> <sup>16</sup> -wave-length diameter, i.e. *<sup>k</sup>* a = <sup>π</sup>

<sup>16</sup> (d) 1-wave-length diameter, i.e. *k* a = π

(e) <sup>1</sup> <sup>8</sup> -wave-length diameter, i.e. *<sup>k</sup>* a = <sup>π</sup>

<sup>8</sup> (f) 2-wave-length diameter, i.e. *k* a=2π

(g) <sup>1</sup> <sup>4</sup> -wave-lengths diameter, i.e. *<sup>k</sup>* a = <sup>π</sup>

<sup>4</sup> (h) 4-wave-lengths diameter, i.e. *k* a=4π

**Fig. 6.8** Interpolated plane-wave sound pressure image <sup>ℜ</sup>{*p e*−i*k*a} on rigid-sphere array with 25 microphones allowing decomposition up to the order N = 4; simulation uses orders up to 25, and the aliasing-free operation can only be expected within *kr* < N

**Fig. 6.9** Filters (*k*a)<sup>2</sup> *<sup>h</sup>*(2) *<sup>n</sup>* (*k*a)/dB over frequency/Hz, regularized with (*<sup>n</sup>* <sup>+</sup> <sup>1</sup>)th-order high-pass filters

**Fig. 6.10** Stabilizing filter bank/dB over frequency/Hz: signal orders *n* > *b* are excluded from the band *b*

noise boost of 20 dB for a 4th-order microphone with a = 4.2 cm, at most. However, just cutting on the frequencies of each order is not enough: every cut-on frequency causes a noticeable loudness drop below due to the discarded signal contributions. It is better to design a filter bank with crossovers instead, which allows compensation for the loudness loss in every band. A zero-phase, *n*th-order Butterworth high-pass response is defined by *<sup>H</sup>*hi <sup>=</sup> <sup>ω</sup>*<sup>n</sup>* <sup>1</sup>+ω*<sup>n</sup>* and amplitude-complementary to the low pass *<sup>H</sup>*lo <sup>=</sup> <sup>1</sup> <sup>1</sup>+ω*<sup>n</sup>* , so that *<sup>H</sup>*hi <sup>+</sup> *<sup>H</sup>*lo <sup>=</sup> 1.

Using this filter type, the filter bank in Fig. 6.10 can be constructed as follows: The band-pass filters *Hb*(ω) are composed of a (*b* + 1)th-order high- and (*b* + 2)th-order low-pass skirt at ω*b*, and ω*<sup>b</sup>*+1, respectively, except for the band *b* = 0 (low-pass) and *b* = N (high-pass)

$$\hat{H}\_0(\omega) = \frac{1}{1 + \left(\frac{\omega}{\omega\_l}\right)^2}, \quad \hat{H}\_b(\omega) = \frac{\left(\frac{\omega}{\alpha y}\right)^{b+1}}{1 + \left(\frac{\omega}{\alpha y}\right)^{b+1}} \frac{1}{1 + \left(\frac{\omega}{\alpha y + 1}\right)^{b+2}}, \quad \hat{H}\_N(\omega) = \frac{\left(\frac{\omega}{\alpha y}\right)^{N+1}}{1 + \left(\frac{\omega}{\alpha y}\right)^{N+1}}.\tag{6.22}$$

To make the bands perfectly reconstructing, filters are normalized by the sum response

$$H\_b = \frac{\ddot{H}\_b}{\sum\_{b=0}^{N} \hat{H}\_b(\omega)}.\tag{6.23}$$

By adjusting the cut-on frequencies ω*<sup>b</sup>* of the different orders *b* = 1,..., N, the noise and mapping behavior of the microphone array is adjusted; only the zeroth order is present in every band down to 0 Hz.

This filter bank design moreover allows to adjust loudness and sidelobe suppression in every frequency band, separately.

#### **6.9 Loudness-Normalized Sub-band Side-Lobe Suppression**

The filter bank design shown above would only yield Ambisonic signals whose order increases with the frequency band. Ideally, this variation of the order comes with the necessity of individual max-*r*<sup>E</sup> sidelobe suppression in every band. Moreover, Ambisonic signals of different orders are differently loud, so also diffuse-field equalization of the *E* measure is desirable in every band.

To fulfill the above constraints, we propose to use the following set of FIR filter responses as given in [26, 27], that are modified by a filter bank employing diffuse-field normalized max-*r*E-weights in separate frequency bands *b* = 0,..., N, cf. Fig. 6.11, with the *n*th order discarded for bands below *b* < *n*:

$$\rho\_n(\omega) = \left[\sum\_{b=n}^{N} a\_{n,b} \, H\_b(\omega)\right] \mathbf{i}^{-n-1}(k\mathbf{a})^2 \, h\_n'^{(2)}(k\mathbf{a}) \, e^{i\mathbf{k}\mathbf{a}}.\tag{6.24}$$

Here, *<sup>e</sup>*i*k*<sup>a</sup> removes the linear phase of *<sup>h</sup>*(2) *<sup>n</sup>* , and *an*,*<sup>b</sup>* is the set of diffuse-field (√*E*) equalized max-*r*<sup>E</sup> weights for the band *b* in which the Ambisonic orders retrieved are 0 ≤ *n* ≤ *b*

$$a\_{n,b} = \begin{cases} P\_n(\cos\frac{137.9^\circ}{b+1.51}) \sqrt{\frac{\sum\_{n=0}^{\text{N}} (2n+1) \left[ P\_n \left( \cos\frac{137.9^\circ}{\text{N}+1.51} \right) \right]^2}{\sum\_{n=0}^b (2n+1) \left[ P\_n \left( \cos\frac{137.9^\circ}{b+1.51} \right) \right]^2}, & \text{for } n \le b\\ 0, & \text{otherwise.} \end{cases} \tag{6.25}$$

Figure 6.12 shows the polar patterns of the corresponding direction-spread functions.

For the implementation of ρ*n*(ω) by fast block filtering,ω = 2π *f* and *k* = ω/*c* are uniformly sampled with frequency, and the inverse discrete Fourier transform yields the associated impulse responses (attention: the value at 0 Hz must be replaced for stable results, and cyclic time-domain shifts and windows are necessary).

The direction-spread function of a plane-wave sound pressure mapped to a directional Ambisonic signal becomes frequency-dependent as shown in Fig. 6.13, and it has minimal side lobes.

**Fig. 6.11** Filter-bank-regularized/dB over frequency/Hz, diffuse-field equalized max-*r*<sup>E</sup> weighted spherical microphone array responses using i*n*ρ*n*(ω) <sup>=</sup> <sup>N</sup> *<sup>b</sup>*=*<sup>n</sup> an*,*<sup>b</sup> Hb*(ω) (*k*a)<sup>2</sup> *<sup>h</sup>*(2) *<sup>n</sup>* (*k*a)

**Fig. 6.12** Diffuse-field equalized (to *E* = 1) max-*r*<sup>E</sup> direction-spread functions; even orders are plotted on upper, odd orders on lower semi-circle

**Fig. 6.13** Direction spread/dB over frequency/Hz in zenithal cross section/degrees through Ambisonic signal of simulated microphone processing response to plane wave from zenith and the parameters a = 4.2 cm, M = 25 mics., max-*r*E-weighted in bands 90, 680, 1650, 2600 Hz for the cut on of the orders 1, 2, 3, 4. Simulation is done with the order Nsim = 30 and spatial aliasing will occur above 5.2 kHz. Gain matching was assumed to be up to < ±0.5 dB accurate; the map shows the direction spread normalized to its value at 0◦ for every frequency to make its shape easier to read

#### **6.10 Influence of Gain Matching, Noise, Side-Lobe Suppression**

Typical gain mismatch between the microphones is not always more accurate than 0.5 dB. The result is that the physically dominant omnidirectional signal will leak into the higher-order signals by directionally random gain variations. However, acoustically, higher-order components are expected to be weak and to require amplification.

(c) 50, 160, 500, 1600 cut on with individual max-*r*<sup>E</sup> sidelobe suppression per band, assuming perfect gain match

**Fig. 6.14** Influence of carelessly selected cut-on frequencies for regularization (top), and of nonindividual sidelobe suppression per band (middle), in contrast to ideal results (bottom); the maps show direction spreads normalized to their values at 0◦ for every frequency to make side lobes easier to read

The effect on mapping is equivalent to one of microphone self noise, however gain mismatch yields a correlated signal exciting the microphones, whereas self-noise yields low-frequency noise.

If regularization filters were set to 50, 160, 500, 1600 and sidelobe suppression turned off for testing, one would get the poor image as in Fig. 6.14a, where high-order signals at low frequencies are highly boosted.

If a noise-free case is assumed, and only the max-*r*<sup>E</sup> side-lobe suppression of the highest band is used for all bands, one gets the image in Fig. 6.14b, which improves with individual max-*r*<sup>E</sup> weights in Fig. 6.14c.

*Self-noise behavior*. Assuming that self-noise of the microphones is uncorrelated, it will also remain uncorrelated and of equal strength after decomposing the M microphone signals *pi* <sup>=</sup> *<sup>N</sup>* into the (<sup>N</sup> <sup>+</sup> <sup>1</sup>)<sup>2</sup> spherical harmonic coefficient signals <sup>ψ</sup>*nm* <sup>=</sup> (N+1)<sup>2</sup> <sup>M</sup> *<sup>N</sup>* , if M <sup>≈</sup> (<sup>N</sup> <sup>+</sup> <sup>1</sup>)<sup>2</sup> and the microphone arrangement permits a well-conditioned pseudo inversion *Y*† N. The spectral change of the microphone self noise due to the radial filters ρ*n*(ω) can be described by the noise of the (2*n* + 1)

**Fig. 6.15** Self-noise modification |*G*(ω)| 2/dB over frequency/Hz for the filter bank configurations using the cut on frequencies 2*k*, 3*k*, 4*k*, 5*k* (no noise amplification), 600, 2*k*, 3.5*k*, 4.2*k* (5 dB noise amplification), 280, 1.3*k*, 2.6*k*, 3.6*k* (10 dB noise amplification), 150, 950, 2*k*, 3.15*k* (15 dB noise amplification), and 90, 680, 1.65*k*, 2.6*k* (20 dB noise amplification)

signals of the same order, amplified by |ρ*n*(ω)| 2, in comparison to the zeroth-order signal:

$$|G(\omega)|^2 = \frac{\sum\_{n=0}^{N} (2n+1) |\rho\_n(\omega)|^2}{|(k\mathbf{a})^2 \, h\_0'^{(2)}(k\mathbf{a})|^2}. \tag{6.26}$$

Figure 6.15 analyzes the noise amplification for the simulation example (max-*r*<sup>E</sup> weighting in each sub band, a = 4.2 cm) and shows the dependency on exemplary cut on frequencies configured to tune the filterbank to 0, 5, 10, 15, and 20 dB noise boosts. The trade here is: the more noise boost one can allow, the more directional resolution one gets, see Fig. 6.16.

Open measurement data (SOFA format) characterizing the directivity patterns of the 32 Eigenmike em32 transducers are provided under the link http://phaidra.kug.ac.at/o:69292. They are measured on a 12◦ × 11.25◦ azimuth× zenith grid, yielding 480 × 256 pt impulse responses for each of the 32 transducers.

#### **6.11 Practical Free-Software Examples**

#### *6.11.1 Eigenmike Em32 Encoding Using Mcfx and IEM Plug-In Suites*

We give a practical signal processing example for the Eigenmike em32 which is applicable e.g. in digital audio workstations. First the 32 signals are encoded by matrix multiplication (IEM MatrixMultiplier), cf. Fig. 6.17a, yielding 25 fourth-order signals. The preset (json file) is provided online http://phaidra.kug.ac.at/o:79231. The radial filtering that sharpens the surround sound image uses mcfx-convolver, see Fig. 6.17b, with 25 SISO filters, one for each Ambisonic signal, using the 5 different filter curves for the orders *n* = 0,..., 4 as defined above. The convolver presets (wav

**Fig. 6.16** Direction spread/dB for over frequency/Hz and zenith/degrees of filterbank with different settings to achieve 0, 5, 10, 15, 20 dB noise boosts; the maps show direction spreads normalized to their values at 0◦ at every frequency as above

**Fig. 6.17** IEM MatrixMultiplier encoding the Eigenmike em32 signals and mcfxconvolver applying radial filters to encoded em32 recording

**Fig. 6.18** Practical equalization of the em32 transducer characteristics by two parametric shelving filters of the mcfx\_filter, cf. [28]

files and config files for mcfx-convolver) are provided online http://phaidra.kug. ac.at/o:79231 and are available for the different noise boosts 0, 5, 10, 15, 20 dB. As found in [28], the em32 transducers exhibit a frequency response that favors low frequencies and attenuates high frequencies. This behavior is sufficiently well equalized in practice using two parametric shelving filters, a low shelf at 500 Hz with a gain of −5 dB, and a high shelf at 5 kHz using a gain of +5 dB, see Fig. 6.18.

#### *6.11.2 SPARTA Array2SH*

The SPARTA suite by Aalto University includes the Array2SH plug-in shown in Fig. 6.19 to convert the transducer signals of a microphone array into Ambisonics. It provides both encoding of the signals, as well as calculation and application of radialfocusing filters based on the geometry of the array. It supports rigid and open arrays

**Fig. 6.19** SPARTA Array2SH encoding for, e.g., em32

and comes with presets for several arrays, such as the Eigenmike em32. The plug-in allows to adjust the radial filters in terms of regularization type and maximum gain. The Reg. Type called Z-Style corresponds to the linear-phase design of Sect. 6.9.

#### **References**


152 6 Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Chapter 7 Compact Spherical Loudspeaker Arrays**

*Le haut-parleur anonymise la source réelle.* Pierre Boulez [1], ICA, 1983. *…adjustable radiation naturalize[s] alien sounds by embedding them in the natural spatiality of the room.* OSIL project [2], InSonic, 2015.

**Abstract** This chapter introduces auditory objects that can be created by adjustabledirectivity sources in rooms. After showing basic positioning properties in distance and direction, we describe physical first- and higher-order spherical loudspeaker arrays and their control, such as the loudspeaker cubes or the icosahedral loudspeaker (IKO). Not only static auditory objects, but such traversing space by their timevarying beam forming are considered here. Signal dependency and different practical setups are discussed and briefly analyzed. This young Ambisonic technology brings new means of expression to sound reinforcement, electroacoustic or computer music.

While surrounding Ambisonic loudspeaker arrays play sound from outside the listening area into the audience, compact spherical loudspeaker arrays play sound into the room from a single position. Directivity adjustable in orientation and shape can be used to steer sound beams in order to excite wall reflections in the given, acoustic environment. The directional shapes and orientations of such beams are all controlled by—guess what—Ambisonic signals. Despite the huge practical difference, both applications do not only share the spherical harmonics that lend their shapes to Ambisonic signals: The control of radiating sound beams employs nearly the same model- or measurement-based radial steering filters as those of compact higher-order Ambisonic microphones.

The works of Warusfel [3], Kassakian [4], Avizienis [5], Zotter [6, 7], Pomberger [8], Pollow [9], Mattioli Pasqual [10] established the electroacoustic background technology required to describe compact spherical loudspeaker arrays built with electrodynamic transducers. The early works on auditory objects were written by Schmeder [11], Sharma, Frank, and Zotter [2, 12, 13]. And some contemporary results were found in the project "Orchestrating the Space by Icosahedral Loudspeaker" (OSIL) between 2015 and 2018 [14–19].

#### **7.1 Auditory Events of Ambisonically Controlled Directivity**

#### *7.1.1 Perceived Distance*

Laitinen showed in [20] that increasing the directivity of a listener-facing loudspeaker array from omnidirectional to second order was able to create auditory events that were perceptually closer than the physical distance to the loudspeaker array. The experimental results can be explained by the increase of the direct-to-reverberant energy ratio, as the sound beam of the directional source does not as much excite room reflections.

Wendt extended Laitinen's work by experiments employing a simulation of a third-order directional source in a virtual room (third-order image source model) played back by a loudspeaker ring in an anechoic room [16]. He could show that the perceived distance between the listener and the higher-order directional source could not only be controlled by the order of the directivity pattern but also by the orientation of the source (towards the listener, away from the listener). Beams projecting sounds away from the listener were perceived behind the source, cf. Fig. 7.1. Again, the perceptual results could be modeled by simple measures known from room acoustics.

#### *7.1.2 Perceived Direction*

Using a similar room simulation, the study in [21] asked participants to indicate the perceived direction of an auditory event created by a third-order directional source. The results showed that for different source orientations, listeners perceived auditory objects at directions that often did not coincide with the sound source, but with the delayed reflection paths, cf. Fig. 7.2. Perceived directions focused on the direct sound and the three first reflections after 6, 8, and 9 ms. For some orientations, still even

**Fig. 7.2** IKO perceived directions (black circles, radii indicate relative amount of answers) and modeling (gray crosses), 3rd-order max-*r*<sup>E</sup> beam, 2nd-order image source model. Gray shading in the background indicates level of each path

the second-order reflections at 12 and 14 ms were dominating localization. However, the influence of later reflections is reduced by the precedence effect. The perceived directions can be modeled by the extended energy vector originally developed for offcenter listening positions in surrounding loudspeakers arrangements, as also shown in [17]. Experiments in [22] showed that panning between a reflection and the direct sound creates auditory objects in between. When applying the appropriate delay and gain to the direct sound to compensate for the longer path of the reflection, the localization curves are similar to those of standard stereo using a pair of loudspeakers.

#### **7.2 First-Order Compact Loudspeaker Arrays and Cubes**

The simplest way of creating a loudspeaker array with adjustable directivity in a practical sense is a cube with loudspeakers on its plane surfaces, as suggested by Misdariis [23]. Restricting the directivity control to two dimensions reduces the number of loudspeaker drivers to four and facilitates to equip the array with a carrying handle on top and a flange adapter at the bottom, cf. [24] and Fig. 7.3.

*Directivity control*. First-order Ambisonics utilizes monopole and dipole modes, which directly translate to the corresponding far-field radiation patterns. These modes can easily be created due to the cubic shape by either playing of all four drivers in phase or the opposing drivers out of phase, cf. Fig. 7.4. Nevertheless, the frequency responses of such monopole and dipole modes need to be equalized to enable their phase- and magnitude-aligned superposition in the far field. Filters and measurement data of cube loudspeakers built at IEM [24] are freely available on http://phaidra. kug.ac.at/o:67631.

**Fig. 7.3** Design of a loudspeaker cube: prototype, and vertical and horizontal cross section plots

**Fig. 7.4** System controlling the monopoles and dipole modes of the loudspeaker cubes, to accomplish first-order beamforming with the shape parameter α and beam direction ϕ<sup>0</sup>

To overcome the compressive effort of interior volume changes at low frequencies, the filter *H*bctl in Fig. 7.4 equalizes the smaller velocity of the loudspeaker cones when driven omnidirectionally to the velocity when driven in dipoles as a first step, and as a second step, it attenuates the monopole pattern slightly to account for its more efficient radiation at low frequencies. The filter *H*EQ is a general equalizer required to obtain a flat frequency response, 0 ≤ α ≤ 1 is a first-order omni to dipole beamshape parameter, and ϕ<sup>0</sup> is the beam direction. The filter *H*bctl can be specified as a 5th-order IIR filter purely based on geometric and electroacoustic parameters [19].

**Direct and indirect sound with two cubes**. The study in [19] examined the width of the listening area for the creation of a central auditory object between a pair of loudspeaker cubes cf. Fig. 7.5. Steering the two beams directly at the listener yielded a narrow listening area that increased with the distance to the loudspeakers, similar as known from typical stereo applications, cf. Fig. 2.9. A much wider listening area is achieved by steering the beams to the front wall to excite reflections. To this end, max-*r*<sup>E</sup> (super-cardioid) beams were chosen and oriented in a way to ideally suppress direct sound from the loudspeaker cubes at the listening position. The proposed setup of two loudspeaker cubes can be used to play back stable L, C, R channels of a surround production without the need of an actual center loudspeaker.

*Surround with depth*: Together with the distance control described by Laitinen [20], the stable in-between auditory image has been used in [19] to establish a *surround-* distance

**Fig. 7.5** Width of the listening area for a central auditory object at two distances from a pair of loudspeaker cubes with different orientation of max-*r*E/super-cardioid beams

*with-depth* system consisting of a quadraphonic setup of four loudspeaker cubes. As first layer, it uses the direct sounds from the 4 loudspeakers from ±45◦ and ±135◦ together with the 4 in-between images at 0◦, ±90◦, and 180◦ to obtain 8 directions for third-order Ambisonic surround panning. As a second layer for depth, *surround with depth* uses 4 cardioid beams pointing into the 4 room corners to provide the impression of distant sounds. Blending between those two layer is used to control the distance impression of surround sounds.

#### **7.3 Higher-Order Compact Spherical Loudspeaker Arrays and IKO**

With transducers mounted on spheres or polyhedra, higher-order radiators can be built. Typically, those are Platonic solids such as dodecahedra or icosahedra, as they can easily be manufactured from equal-sided polygons cf. Fig. 7.6. Often, the loudspeakers are also mounted onto a common interior volume. Hereby, the higher-order modes can be controlled at reduced impedance of the inner stiffness, however, this also causes acoustic coupling of the transducer motions. Typically, multiple-inputmultiple-output (MIMO) crosstalk cancellers are employed to suppress the coupling and to control the velocity of the transducer cones. If this is accomplished, the acoustic radiation can be modeled and equalized by the spherical cap model, cf. [6, 15, 25, 26].

**Fig. 7.6** Powerful icosahedral loudspeaker array (IKO by IEM and Sonible) and reflecting baffles in Ligeti concert hall, in preparation of an electroacoustic music concert

*Cap model*. Higher-order loudspeaker arrays on a compact spherical housing are modeled by the spherical cap model. It assumes for the exterior air that the radial surface velocity is a boundary condition consisting of separated spherical cap shapes of the size α centered around the directions {**θ***l*}, each unity in value. These idealized transducer shapes driven by the transducer velocities *vl* compose the surface velocity

$$\nu(\boldsymbol{\theta}) = \sum\_{l=0}^{L} \mu(\boldsymbol{\theta}\_l^T \boldsymbol{\theta} - \cos\frac{\boldsymbol{\alpha}}{2}) \,\mathrm{v}\_l. \tag{7.1}$$

Here, *u*(ζ ) denotes the unit step function that is unity for ζ ≥ 0 and zero otherwise. The surface velocity distribution can be decomposed into spherical harmonics as

$$\nu(\boldsymbol{\theta}) = \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} Y\_n^m(\boldsymbol{\theta}) \sum\_{l=0}^{L} w\_{nm}^{(l)} \, \boldsymbol{\nu}\_l \,. \tag{7.2}$$

The coefficients *w*(*l*) *nm* of the *l*th cap are defined by spherical convolution Eq. (A.56) of a Dirac delta δ(**θ**<sup>T</sup> *<sup>l</sup> θ* − 1) pointing to the cap center with a zenithal cap *u*(cos ϑ − cos <sup>α</sup> 2 ):

$$\boldsymbol{w}\_{nm}^{(l)} = \boldsymbol{w}\_n \; Y\_n^m(\boldsymbol{\theta}\_l),\tag{7.3}$$

where *Y <sup>m</sup> <sup>n</sup>* (**θ***l*) are the coefficients expressing the Dirac delta, extended to a cap by weighting with *wn*. The term *wn* <sup>=</sup> <sup>2</sup><sup>π</sup> <sup>1</sup> cos <sup>α</sup> 2 *Pn*(ζ ) dζ is derived in Eq. (A.60)

$$w\_n = 2\pi \begin{cases} -\frac{P\_{n+1}(\cos\frac{\eta}{2}) - \cos\frac{\eta}{2}\ P\_n(\cos\frac{\eta}{2})}{n}, & \text{for } n > 0, \\ 1 - \cos\frac{\eta}{2}, & \text{for } n = 0. \end{cases} \tag{7.4}$$

*Decoder*. Without radiation control yet, any low-order target spherical harmonic *n* ≤ N can be synthesized as velocity pattern φ*nm* by superimposing the spherical cap coefficients *w*(*l*) *nm* with suitable transducer velocities *vl* , i.e. φ*nm* = *<sup>l</sup> wn Y <sup>m</sup> <sup>n</sup>* (**θ***l*) *vl* . We write a matrix/vector notation with the matrix *Y* = [ *y*(**θ**1), . . . , *y*(**θ**L)] containing the spherical harmonics *y*(*θ*) = [*Y <sup>m</sup> <sup>n</sup>* (*θ*)]*nm* sampled at the transducer positions {**θ***l*} to represent Dirac deltas pointing there, and *w* = [*wn*]*nm* to represent the cap shape,

$$
\phi = \text{diag}\{\mathbf{w}\} Y \,\text{v.}\tag{7.5}
$$

As long as the order N up to which coefficients are controlled is low enough L ≥ (N + 1)<sup>2</sup> and transducers are well-distributed, perfect control is feasible. The corresponding velocities are found by solving a least-squares problem, see Appendix A.4, Eq. (A.63), yielding the right inverse of the Nth-order cap-coefficient matrix,

$$\boldsymbol{\nu} = \boldsymbol{Y}\_{\mathrm{N}}^{\mathrm{T}} (\boldsymbol{Y}\_{\mathrm{N}} \boldsymbol{Y}\_{\mathrm{N}}^{\mathrm{T}})^{-1} \operatorname{diag} \{ \boldsymbol{\nu}\_{\mathrm{N}} \}^{-1} \boldsymbol{\phi}\_{\mathrm{N}} = \boldsymbol{D} \operatorname{diag} \{ \boldsymbol{\nu}\_{\mathrm{N}} \}^{-1} \boldsymbol{\phi}\_{\mathrm{N}}.\tag{7.6}$$

The right inverse *<sup>D</sup>* <sup>=</sup> *<sup>Y</sup>*<sup>T</sup> <sup>N</sup>(*Y* <sup>N</sup>*Y*<sup>T</sup> <sup>N</sup>)−<sup>1</sup> is a mode-matching decoder, cf. Eq. (4.40).

*Exterior problem*. The radiated sound pressure is described by the exterior problem denoted by the coefficients *cnm* in Eq. (6.13) and the spherical Hankel functions *h*(2) *<sup>n</sup>* (*kr*). To relate it to a time-derived surface velocity at the array radius *r* = a, we derive the exterior solution with regard to radius <sup>∂</sup>*<sup>p</sup>* <sup>∂</sup>*<sup>r</sup>* <sup>=</sup> *<sup>k</sup>* <sup>∂</sup>*<sup>p</sup>* <sup>∂</sup>*kr* = −i*kc* ρ*v*, cf. Eq. (6.2),

$$\text{cov}(\boldsymbol{\theta}) = \frac{\text{i}}{\rho c} \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} h\_n^{\prime(2)}(k\mathbf{a}) \ \boldsymbol{Y}\_n^m(\boldsymbol{\theta}) \ c\_{nm} \,. \tag{7.7}$$

Comparing Eq. (7.2) to Eq. (7.7) yields *cnm* = ρ*c*[i*h*(2) *<sup>n</sup>* (*k*a)] <sup>−</sup>1<sup>L</sup> *<sup>l</sup>*=<sup>0</sup> *wn <sup>Y</sup> <sup>m</sup> <sup>n</sup>* (**θ***l*) *vl* , the coefficients to calculate the radiated pressure. Far away, we replace the spherical Hankel function that approaches *h*(2) *<sup>n</sup>* (*kr*) → i *<sup>n</sup>*+1*k*−1*e*−i*kr* by the term i*<sup>n</sup>*+1*k*−<sup>1</sup> in Eq. (6.13) so that the radiated far-field sound pressure *p* ∝ i *<sup>n</sup>*+<sup>1</sup>*k*−<sup>1</sup>*Y <sup>m</sup> <sup>n</sup> cnm* becomes

$$p(\boldsymbol{\theta}) \propto \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} Y\_n^m(\boldsymbol{\theta}) \frac{\mathbf{i}^n \ w\_n}{k \, h\_n^{\prime(2)}(k\mathbf{a})} \sum\_{l=0}^{\mathcal{L}} Y\_n^m(\boldsymbol{\theta}\_l) \, \boldsymbol{\nu}\_l. \tag{7.8}$$

#### *7.3.1 Directivity Control*

The spherical harmonics coefficients of the far-field sound pressure pattern in Eq. (7.8) are controlled by the cap velocities *vl*

$$\psi\_{nm} = \frac{\mathbf{i}^n \ w\_n}{k \ h\_n^{\prime(2)}(ka)} \sum\_{l=0}^{\mathcal{L}} Y\_n^m(\mathfrak{H}\_l) \,\mathrm{v}\_l,\tag{7.9}$$

and we desire to form the directional sound beam they represent according to a max-*r*<sup>E</sup> pattern *an Y <sup>m</sup> <sup>n</sup>* (*θ* <sup>0</sup>) yielding radiation focused towards *θ* <sup>0</sup>

$$
\psi\_{nm} = a\_n \, Y\_n^m(\theta\_0). \tag{7.10}
$$

To find suitable cap velocities *vl* , we equate the model Eqs. (7.9) and (7.10). In matrix/vector notation never used the equation is

$$\text{diag}\{\text{i}^{n}w\_{n}\text{ }k^{-1}/h\_{n}^{\prime(2)}(k\mathbf{a})\}\_{nm}\text{ }\mathbf{Y}\mathbf{v}=\text{diag}\{[a\_{n}]\_{nm}\}\mathbf{y}(\theta\_{0}).\tag{7.11}$$

The diagonal matrix on the left is easy to invert, and for patterns up to the order *n* ≤ N, the mode-matching decoder *D* of Eq. (7.6) already gives us a way to define velocities inverting the matrix *Y* <sup>N</sup> from the right. The preliminary solution becomes

$$\mathbf{v} \Rightarrow \mathbf{v} = \mathbf{D} \operatorname{diag} \{ [\mathbf{i}^{-n} w\_n^{-1} \ k \ h\_n^{\prime (2)}(k \mathbf{a}) \ a\_n]\_{nm} \} \ \mathbf{y}\_\mathbf{N}(\theta\_0). \tag{7.12}$$

*On-axis equalized, sidelobe-suppressing directivity control limiting the excursion*. The inverse cap shape coefficient *w*−<sup>1</sup> *<sup>n</sup>* and the max-*r*<sup>E</sup> weight *an* can be regarded as a part of the radiation control filters i−*<sup>n</sup> k h*(2) *<sup>n</sup>* (*k*a). The expression i−*n*−<sup>1</sup>(*k*a)<sup>2</sup> *h*(2) *<sup>n</sup>* (*k*a) of compact spherical microphone arrays (Sect. 6.6) qualitatively differs by a factor *k*. Practical implementation of radiation control filters and their regularization is therefore quite similar to radial filters of spherical microphone arrays. There are three main differences, as explained in [15]:


On-axis equalization yields a different scaling of the sub-band max-*r*<sup>E</sup> weights

$$a\_{n,b} = \begin{cases} P\_n(\cos\frac{137.9^\circ}{b+1.51}) \frac{\sum\_{n=0}^{N} (2n+1) P\_n\left(\cos\frac{137.9^\circ}{N+1.51}\right)}{\sum\_{n=0}^{b} (2n+1) P\_n\left(\cos\frac{137.9^\circ}{b+1.51}\right)}, & \text{for } n \le b, \\ 0, & \text{otherwise.} \end{cases} \tag{7.13}$$

Typically, cut-on frequencies for compact spherical loudspeaker arrays are low, and linear-phase filterbanks would require long pre-delays. It is useful to employ Linkwitz-Riley filters for the crossovers, to get a low-latency implementation. To emphasize the similarity to Eq. (6.22), we write Linkwitz-Riley filters [27] as combination of an all-pass *A<sup>m</sup>* with twice the phase response of an *m*th-order Butterworth low-pass combined either with the magnitude-squared low-pass response [1 + (ω/ω*c*)2*<sup>m</sup>*] <sup>−</sup><sup>1</sup> or high-pass response (ω/ω*c*)2*<sup>m</sup>*[1 + (ω/ω*c*)2*<sup>m</sup>*] <sup>−</sup><sup>1</sup> per crossover. Such a minimum-phase crossover is of even order, so that the minimum-order cut-on slope must be rounded up to the next even order 2 *b*+3 <sup>2</sup> . Plain high/low crossovers would be in-phase unless combined with further crossovers to form narrower bands. However, an in-phase filterbank is obtained after inserting the product of all all-passes in every band, cf. [28]. Although non-minimum-phase, this is still low-latency. For the band *b* containing Ambisonic orders 0 ≤ *n* ≤ *b*, the modified filterbank is

$$H\_b(\omega) = \frac{\left(\frac{\omega}{\omega\_b}\right)^{2\lceil \frac{b+3}{2} \rceil}}{1 + \left(\frac{\omega}{\omega\_b}\right)^{2\lceil \frac{b+3}{2} \rceil}} \frac{1}{1 + \left(\frac{\omega}{\omega\_{b+1}}\right)^{2\lceil \frac{b+4}{2} \rceil}} \prod\_{b'=0}^{N} A\_{\omega\_{b'}}^{\lceil \frac{b'+3}{2} \rceil}(\omega). \tag{7.14}$$

The sum *<sup>b</sup> Hb*(ω) is considered to be sufficiently flat, so that the radial filters for compact spherical loudspeaker arrays using Eqs. (7.8), (7.13), (7.14) become

$$\rho\_n(\omega) = \left[\sum\_{b=n}^{N} a\_{n,b} H\_b(\omega)\right] \text{i}^{-n}\,\omega\_n^{-1}\,h\_n'^{(2)}(k\mathbf{a})\,e^{i\mathbf{k}\mathbf{a}}.\tag{7.15}$$

Figure 7.7 shows the block diagram to control compact spherical loudspeaker arrays by Ambisonic input signals, including the radiation control filters, the decoder, and a voltage-equalizing crosstalk canceller feeding the loudspeakers.

**Fig. 7.7** Signal processing for higher-order compact spherical loudspeaker array control: the Ambisonic directivity signals *χ*N(*t*) run through radiation control filters ρ*n*(ω), a decoder yielding desired velocities *v*(*t*), and a crosstalk canceller/equalizer provides suitable output voltages *u*(*t*)

#### *7.3.2 Control System and Verification Based on Measurements*

*Velocity equalization/crosstalk cancellation*. In the frequency domain, laser vibrometer measurements, cf. Fig. 7.8a, characterize the physical multiple-input-multipleoutput (MIMO) system of transducer input voltages *ul*(ω) to transducer velocities *vl*(ω)

$$\mathbf{v}(\omega) = T(\omega)\,\mathfrak{u}(\omega),\tag{7.16}$$

including the effect of acoustic coupling through the common enclosure. Corresponding open measurement data sets1 can be found online, as described in [18]. Theoretically, the frequency-domain inverse of the matrix *T*(ω) can be used to equalize and control the transducer velocities with acoustic crosstalk cancelled, as indicated in Fig. 7.7,

$$\mathfrak{u}(\omega) = T^{-1}(\omega)\,\mathfrak{v}(\omega). \tag{7.17}$$

In practice, this is only useful up to the frequency at which the loudspeaker cone vibration breaks up into modes, so typically below 1 kHz.

*Control system*: The entire control system with Ambisonic signals *χ*N(ω) as inputs uses Eqs. (7.6), (7.15), (7.17)

$$\mathfrak{u}(\omega) = T^{-1}(\omega) \,\mathsf{D}\,\mathrm{diag}\{\mathfrak{\rho}(\omega)\} \,\mathsf{X}\_{\mathrm{N}}(\omega). \tag{7.18}$$

*Directivity measurement*. It is useful to characterize the directivity obtained by measurements to verify the results; high-resolution 648 × 20 measurements *G*(ω) of the IKO are found online1. The sound pressure can be decomposed with the known directional sampling by left-inversion of a spherical harmonics matrix *Y*<sup>T</sup> 17, see Appendix A.4, Eq. (A.65, which can be up to 17th order on a 10◦ × 10◦ grid in azimuth and zenith:

$$\mathfrak{p}(\omega) = \mathbf{G}(\omega)\,\mathfrak{u}(\omega), \qquad \Rightarrow \,\,\mathfrak{\boldsymbol{\psi}}\_{17}(\omega) = (\mathbf{Y}\_{17}^{\mathrm{T}})^{\dagger}\,\mathfrak{p}(\omega). \tag{7.19}$$

With the highly resolved spherical harmonics coefficients, polar diagrams or balloon diagrams can be evaluated at any direction

$$p(\theta,\omega) = \mathfrak{y}\_{17}(\theta)^{\mathrm{T}} \mathfrak{Y}\_{17}(\omega),\tag{7.20}$$

given any control system delivering suitable voltages *u* for beamforming, as e.g. obtained by Eq. (7.18).

<sup>1</sup>http://phaidra.kug.ac.at/o:67609.

**Fig. 7.8** Measurements on the IKO as a MIMO system in terms of transducer output velocities (left) and radiation patterns (right) depending on the transducer input voltages

**Fig. 7.9** Horiontal cross section of the IKO's directivity/dB over frequency/Hz and azimuth/degrees when beamforming to 0◦ azimuth on the horizon, with radiation control filters above, with filterbank frequencies (38, 75, 125, 210) Hz

To inspect the frequency-dependent directivity, a horizontal cross section is shown in Fig 7.9. The beamforming gets effective above 100 Hz and a beam width of ±30◦ is held until 2 kHz. The filterbank starts the 0th order above 38 Hz, and with 75, 125, 210 Hz, 1st, 2nd, and 3rd order are successively added including on-axis equalized max-*r*<sup>E</sup> weightings. Above 2 kHz both spatial aliasing and modal breakup of the transducer cones affect directivity. However, these beamforming-directiondependent distortions are often negligible in typical rooms.

sound

#### **7.4 Auditory Objects of the IKO**

#### *7.4.1 Static Auditory Objects*

The study in [16] showed that distance control by changing the directivity and its orientation can also be achieved with the IKO in a real room, cf. Fig. 7.10. The experiments used stationary pink noise and could create auditory objects nearly 2 m behind the IKO, which corresponds to the distance between the IKO and the front wall of the playback room.

The maximum distance of auditory objects created by the IKO is strongly signaldependent. Experiments in [14] showed that the auditory distance of pink noise bursts decreased for shorter fade-in times, while the fade-out time had no influence, cf. Fig. 7.11. A transient click sound was perceived even closer to the IKO. This can be explained by the precedence effect, that favors the earlier direct sound over the reflected sound from the walls. While this effect is strong for transient sounds, it is inhibited for stationary sounds with long fade-in times.

However, the precedence effect can even be reduced for transient click sounds by simultaneous playback of a masker sound that reduces the influence of the direct sound [29]. In comparison to no masker, playing a pink noise masker doubles the auditory distance, cf. Fig. 7.12. Using the room noise as a masker by playing the

target sound very softly further increases the distance and yield a perception that is detached from the IKO.

#### *7.4.2 Moving Auditory Objects*

The studies in [14, 15] extended the previous listening experiments towards simple time-varying beam directions, such as from the left to the right, front/back or circles. To report the perceived locations of the moving auditory objects, listeners used a touch screen that showed a floor plan of the room, including the listening position and the position of the IKO. They had to indicate the location of the auditory object's trajectory every 500 ms. The perceived trajectories depend on the listening position, but they can always be recognized, cf. Fig. 7.13. The empirical knowledge was

**Fig. 7.13** Average perceived locations for each 500 ms step during front/back-movement (dark gray) and left/right-movement (light gray) at two listening positions, triangle indicates start and asterisk end of the trajectory

**Fig. 7.14** Average perceived locations for each 500 ms step during circular movement of transient sound (dark gray) and stationary noise (light gray) without and with additional reflectors, triangle indicates start and asterisk end of the trajectory

applied in the artistic study in [14] about body-space relations, composing sounds that are spatialized with different static directions and simple movements.

For concerts, the artistic practice evolved to set the IKO up together with reflector baffles, cf. Fig. 7.14. A recent study in [30] investigated their effect on the perception of moving transient and stationary sounds. The baffles obviously reduce the signaldependency by contributing more additional reflection paths, contrasting the direct sound.

#### **7.5 Practical Free-Software Examples**

#### *7.5.1 IEM Room Encoder and Directivity Shaper*

The IEM Room Encoder VST plug-in, cf. Fig. 4.36, can not only be used to simulate the room reflections of an omnidirectional sound source based on the imagesource method, but it also supports directional sound sources. As format, it employs Ambisonics with ACN ordering and adjustable normalization up to seventh order. Thus, it enables to utilize data from directivity measurements or even directional recordings done with a surrounding spherical microphone, e.g. to put real instrument recordings into the virtual room.

As an alternative, the IEM Directivity Shaper, cf. Fig. 7.15 provides simple means to generate a frequency-dependent directivity pattern from scratch and to apply it on a mono input signal. This is useful to generate the typical rotary speaker effect of a Leslie cabinet.

**Fig. 7.15** IEM Directivity Shaper plug-in

#### *7.5.2 IEM Cubes 5.1 Player and Surround with Depth*

As shown in Fig. 7.5, a pair of loudspeaker cubes can create a stable auditory event in between them to replace an actual center loudspeaker. In order to play back an entire 5.1 production, the IEM cubes 5.1 Player plug-in extends this approach by two additional beams to the side walls for the surround channels, cf. Fig. 7.16. The plug-in provides a control of the shape, direction, and level of all beams, as well as a delay compensation for the reflection paths.

Surround sound with depth can be realized with a quadraphonic setup of four loudspeaker cubes and a combination of the cubes Surround Decoder and multiple Distance Encoder plug-ins, cf. Fig. 7.16. For each source, the Distance Encoder controls position and distance, i.e. the blending between the two layers. The output of the plug-in is a 10-channel audio stream including 7 channels for third-order (inner layer) and 3 for first-order 2D Ambisonics (outer depth layer). The cubes Surround Decoder plug-in decodes the 10-channel audio stream and distributes the signals to the 16 drivers of four loudspeaker cubes. For each loudspeaker cube, the directions to excite direct and reflected sound of the inner layer and the diffuse sound of the depth layer can be adjusted in order to adapt to the playback environment. Additionally, the directivity patterns for direct, reflected, and diffuse sound beams

**Fig. 7.16** IEM cubes 5.1 Player, cubes Surround Decoder and Distance Encoder plug-ins

can be controlled, as well as a delay to compensate for the longer propagation paths of the reflected sound. The plug-ins are available under https://git.iem.at/audioplugins/ CubeSpeakerPlugins.

#### *7.5.3 IKO*

Spatialization using the IKO can use a similar infrastructure of plug-ins as surrounding loudspeaker arrays. Ambisonic encoder plug-ins, such as the ambix\_encoder or the IEM StereoEncoder or MultiEncoder, create the third-order Ambisonic signals that are subsequently fed to a decoder. Decoding to the IKO requires the processing steps as shown in Fig. 7.7: radiation control filters in the spherical harmonic domain, decoding from spherical harmonics to transducer signals, as well as crosstalk cancellation and equalization of the transducers. This processing can be summarized in a 16 (spherical harmonics up to third order) × 20 (transducers) filter matrix. Convolution can be done efficiently using the mcfx\_convolver plug-in. Filter presets for the IKO can be found under http://phaidra.kug.ac.at/o:79235.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

*Many have written of the experience of mathematical beauty as being comparable to that derived from the greatest art.*

> S. Zeki, J.P. Romaya, D.M.T. Benincasa, M.F. Atiyah [1] Frontiers in Human Neuroscience, Feb. 2014.

#### **A.1 Harmonic Functions**

The Laplacian is defined in the D-dimensional Cartesian space as

$$
\Delta = \sum\_{j=1}^{D} \frac{\partial^2}{\partial x\_j^2}.
$$

The Laplacian eigenproblem

$$\Delta f = -\lambda f$$

is solved by harmonics, on a finite interval.

#### **A.2 Laplacian in Orthogonal Coordinates**

In general, coordinates can be expressed by n-tuples of values. For instance, the Cartesian coordinates are (*x*1, *x*2,...) and coordinates of another coordinate systems are (*u*1, *u*2,...), and both describe the location of a point in a space depending on a finite number of dimensions. Each location of the space should be accessible by both

<sup>©</sup> The Editor(s) (if applicable) and The Author(s) 2019

F. Zotter and M. Frank, *Ambisonics*, Springer Topics in Signal Processing 19, https://doi.org/10.1007/978-3-030-17207-7

coordinate systems and there should be a bijective mapping between both systems, e.g. *u <sup>j</sup>* = *u <sup>j</sup>*(*x*1, *x*2,...). A single differentiation with regard to the component *xi* , for instance, is described by the chain rule and consists of the sum of weighted partial differentials with regard to *u <sup>j</sup>* :

$$\frac{\partial}{\partial \mathbf{x}\_i} = \sum\_j \frac{\partial \mu\_j}{\partial \mathbf{x}\_i} \frac{\partial}{\partial \mu\_j}. \tag{A.1}$$

Written in terms of vectors, the (Cartesian) gradient **<sup>∇</sup>** <sup>=</sup> <sup>∂</sup> <sup>∂</sup> *<sup>x</sup>* with <sup>∂</sup> <sup>∂</sup>*<sup>u</sup>* yields:

$$\nabla = \frac{\partial \mathbf{u}^{\mathsf{T}}}{\partial \mathbf{x}} \frac{\partial}{\partial \boldsymbol{\mu}} := \mathbf{J}\_{\mathsf{A}^{\mathsf{A}}/\mathsf{A}^{\mathsf{T}}} \frac{\partial}{\partial \boldsymbol{\mu}},\tag{A.2}$$

for which the Jacobian matrix *J*∂*u*/∂*<sup>x</sup>* = <sup>∂</sup>*u <sup>j</sup>* ∂*xi i j* that is either written in dependency of *x* or *u* represents all the partial derivatives of the mapping between the coordinate systems. For bijective mappings also Jacobian of the inverse mapping exists *<sup>J</sup>*∂*x*/∂*<sup>u</sup>* <sup>=</sup> ∂*xi* ∂*u <sup>j</sup> ji* . The coordinate systems are equivalent if the determinant of the Jacobian is non-zero | *J*| = 0.

Orthogonal coordinate systems have the interesting property that the rows of the Jacobian (or its columns) are orthogonal, so that *J*<sup>T</sup> *J* yields a diagonal matrix. With *<sup>J</sup>*∂*x*/∂*<sup>u</sup>* <sup>=</sup> <sup>∂</sup> *<sup>x</sup>*<sup>T</sup> <sup>∂</sup>*<sup>u</sup>* the meaning of this property becomes easier to understand: the differential changes in the location ∂ *x*/∂*u <sup>j</sup>* of each Cartesian coordinate into the direction of each individual non-Cartesian coordinate *u <sup>j</sup>* describes an orthogonal set of motion directions in space, whose orientation depends on the location and whose individual lengths may vary.

To obtain a description of the Laplacian in the Helmholtz equation - = *i* ∂2 ∂*x*<sup>2</sup> *i* is our goal here, and it can be obtained with the chain rule, now calculating from *xi* to *u <sup>j</sup>* ,

$$\begin{split} \Delta = \sum\_{i} \frac{\partial}{\partial \boldsymbol{x}\_{i}} \left( \frac{\partial}{\partial \boldsymbol{x}\_{i}} \right) &= \sum\_{i} \frac{\partial}{\partial \boldsymbol{x}\_{i}} \left( \sum\_{j} \frac{\partial \boldsymbol{u}\_{j}}{\partial \boldsymbol{x}\_{i}} \, \frac{\partial}{\partial \boldsymbol{u}\_{j}} \right) \\ &= \sum\_{i,j} \frac{\partial^{2} \boldsymbol{u}\_{j}}{\partial \boldsymbol{x}\_{i}^{2}} \, \frac{\partial}{\partial \boldsymbol{u}\_{j}} + \sum\_{i,j,k} \frac{\partial \boldsymbol{u}\_{j}}{\partial \boldsymbol{x}\_{i}} \, \frac{\partial \boldsymbol{u}\_{k}}{\partial \boldsymbol{x}\_{i}} \, \frac{\partial^{2}}{\partial \boldsymbol{u}\_{j} \partial \boldsymbol{u}\_{k}}, \end{split} \tag{A.3}$$
 
$$\text{with } \sum\_{i,j,k} \frac{\partial \boldsymbol{u}\_{j}}{\partial \boldsymbol{x}\_{i}} \, \frac{\partial \boldsymbol{u}\_{k}}{\partial \boldsymbol{x}\_{i}} \, \frac{\partial^{2}}{\partial \boldsymbol{u}\_{j} \partial \boldsymbol{u}\_{k}} = \mathbf{1}^{\mathbb{T}} \left[ \underbrace{\left( \mathbf{J}^{\mathsf{T}} \mathbf{J} \right)}\_{\text{ortho:diag}} \circ \left( \frac{\partial}{\partial \boldsymbol{u}^{\mathsf{T}}} \, \frac{\partial}{\partial \boldsymbol{u}} \right) \right] = \sum\_{i,j} \left( \frac{\partial \boldsymbol{u}\_{j}}{\partial \boldsymbol{x}\_{i}} \right)^{2} \frac{\partial^{2}}{\partial \boldsymbol{u}\_{j}^{2}},$$

with ◦ denoting the element-wise, i.e. Hadamard product. Orthogonal coordinates largely simplify the Laplacian (see last line) and make it consist of first- and secondorder differentials with regard to the new coordinates, individually, with all mixed

derivatives canceling. Both first- and second-order differentials are weighted by the partial derivatives of the coordinate mapping. For each *u <sup>j</sup>* , the Laplacian is composed of those two expressions

$$\Delta = \sum\_{j} \Delta\_{u\_j}, \quad \text{where } \Delta\_{u\_j} = \left[\sum\_{i} \frac{\partial^2 u\_j}{\partial \mathbf{x}\_i^2}\right] \frac{\partial}{\partial u\_j} + \left[\sum\_{i} \left(\frac{\partial u\_j}{\partial \mathbf{x}\_i}\right)^2\right] \frac{\partial^2}{\partial u\_j^2}. \quad (\text{A.4})$$

#### **A.3 Laplacian in Spherical Coordinates**

The right-handed spherical coordinate systems in ISO31-11, ISO80000-2, [2, 3], uses a radius*r*, an azimuth angle ϕ, and a zenigh angleϑ, mapping to Cartesian coordinates *x* = *r* cos ϕ sin ϑ, *y* = *r* sin ϕ sin ϑ, *z* = *r* cos ϑ, or inversely *r* = *x* <sup>2</sup> + *y*<sup>2</sup> + *z*2, <sup>ϕ</sup> <sup>=</sup> arctan *<sup>y</sup> <sup>x</sup>* , ϑ = arctan <sup>√</sup>*x*2+*y*<sup>2</sup> *<sup>z</sup>* , see Fig. 4.11.

Re-expressing the zenith angle coordinate by <sup>ζ</sup> <sup>=</sup> cos <sup>ϑ</sup> <sup>=</sup> *<sup>z</sup> <sup>r</sup>* reduces the effort in calculation and yields *x* = *r* cos ϕ 1 − ζ 2, *y* = *r* sin ϕ 1 − ζ 2, *z* = *r* ζ .

In order to obtain solutions along the angular dimensions azimuth and zenith, we first need to re-write the Laplacian from Cartesian to spherical coordinates. For first-order derivative along the *x* axis, we get the generalized differential

$$\frac{\partial}{\partial \boldsymbol{x}} = \left[\frac{\partial \boldsymbol{r}}{\partial \boldsymbol{x}}\right] \frac{\partial}{\partial \boldsymbol{r}} + \left[\frac{\partial \boldsymbol{\varphi}}{\partial \boldsymbol{x}}\right] \frac{\partial}{\partial \boldsymbol{\varphi}} + \left[\frac{\partial \boldsymbol{\xi}}{\partial \boldsymbol{x}}\right] \frac{\partial}{\partial \boldsymbol{\xi}}.$$

As the Cartesian and spherical coordinates are orthogonal, therefore any mixed second-order derivatives in Cartesian or spherical coordinates vanish. We may derive a second time wrt. *x*:

$$\begin{split} \frac{\partial}{\partial x} \left[ \frac{\partial}{\partial x} \right] = \left[ \frac{\partial^2 r}{\partial x^2} + \left( \frac{\partial r}{\partial x} \right)^2 \frac{\partial}{\partial r} \right] \frac{\partial}{\partial r} \\ &+ \left[ \frac{\partial^2 \varrho}{\partial x^2} + \left( \frac{\partial \varphi}{\partial x} \right)^2 \frac{\partial}{\partial \varphi} \right] \frac{\partial}{\partial \varphi} + \left[ \frac{\partial^2 \xi}{\partial x^2} + \left( \frac{\partial \xi}{\partial x} \right)^2 \frac{\partial}{\partial \xi} \right] \frac{\partial}{\partial \xi} . \end{split}$$

Obviously, we require all first-order derivatives squared, and all second-order derivatives of the spherical coordinates.

#### *A.3.1 The Radial Part*

With *r* = *x* <sup>2</sup> + *y*<sup>2</sup> + *z*<sup>2</sup> we obtain for the radial part <sup>r</sup> <sup>=</sup> <sup>∂</sup>2*<sup>r</sup>* <sup>∂</sup>*x*<sup>2</sup> <sup>+</sup> <sup>∂</sup>*<sup>r</sup>* ∂*x* <sup>2</sup> <sup>∂</sup> ∂*r* ∂ <sup>∂</sup>*<sup>r</sup>* of the Laplacian

$$
\left[\frac{\partial r}{\partial x}\right]^2 = \left[\frac{\partial \sqrt{x^2 + y^2 + z^2}}{\partial x}\right]^2 = \left[\frac{1}{2\sqrt{x^2 + y^2 + z^2}} 2x\right]^2 = \left[\frac{x}{r}\right]^2 = \frac{x^2}{r^2},
$$

$$
\frac{\partial^2 r}{\partial x^2} = \frac{\partial}{\partial x} \left[\frac{x}{r}\right] = \frac{1}{r}\frac{\partial x}{\partial x} + x\frac{\partial r}{\partial x}\frac{\partial}{\partial r}\frac{1}{r} = \frac{1}{r} - x\frac{x}{r}\frac{1}{r^2} = \frac{r^2 - x^2}{r^3}.
$$

For *x*, *y*, *z* altogether, this is for:

$$\Delta\_{\mathbf{f}} = \left[\frac{3r^2 - \mathbf{x}^2 - \mathbf{y}^2 - z^2}{r^3}\right] \frac{\partial}{\partial r} + \left[\frac{\mathbf{x}^2}{r^2} + \frac{\mathbf{y}^2}{r^2} + \frac{z^2}{r^2}\right] \frac{\partial^2}{\partial r^2} = \frac{2}{r} \frac{\partial}{\partial r} + \frac{\partial^2}{\partial r^2} . \quad (\text{A.S.})$$

*2D*. In two dimensions, there is no *z* coordinate, therefore there is just one term fewer:

$$
\Delta\_{\rm r,2D} = \left[\frac{2r^2 - \mathbf{x}^2 - \mathbf{y}^2}{r^3}\right] \frac{\partial}{\partial r} + \left[\frac{\mathbf{x}^2}{r^2} + \frac{\mathbf{y}^2}{r^2}\right] \frac{\partial^2}{\partial r^2} = \frac{1}{r} \frac{\partial}{\partial r} + \frac{\partial^2}{\partial r^2}.\tag{A.6}
$$

#### *A.3.2 The Azimuthal Part*

With<sup>ϕ</sup> <sup>=</sup> arctan *<sup>y</sup> <sup>x</sup>* and <sup>d</sup> <sup>d</sup>*<sup>x</sup>* arctan *<sup>x</sup>* <sup>=</sup> <sup>1</sup> <sup>1</sup>+*x*<sup>2</sup> , the azimuthal part<sup>ϕ</sup> <sup>=</sup> <sup>∂</sup>2<sup>ϕ</sup> <sup>∂</sup>*x*<sup>2</sup> <sup>+</sup> ∂ϕ ∂*x* <sup>2</sup> <sup>∂</sup> ∂ϕ ∂ ∂ϕ of the Laplacian becomes

 ∂ϕ ∂*x* 2 = ∂ arctan *<sup>y</sup> x* ∂*x* 2 = 1 <sup>1</sup> <sup>+</sup> *<sup>y</sup>*<sup>2</sup> *x*2 ∂ ∂*x y x* 2 = <sup>−</sup> *<sup>x</sup>* <sup>2</sup> *x* <sup>2</sup> + *y*<sup>2</sup> *y x* 2 2 = − *y r* 2 xy2 <sup>=</sup> *<sup>y</sup>*<sup>2</sup> *r* 4 xy , ∂ϕ ∂*y* 2 = ∂ arctan *<sup>y</sup> x* ∂*y* 2 = *x* <sup>2</sup> *x* <sup>2</sup> + *y*<sup>2</sup> ∂ ∂*y y x* 2 = *x* <sup>2</sup> *x* <sup>2</sup> + *y*<sup>2</sup> 1 *x* 2 = *x r* 2 xy 2 <sup>=</sup> *<sup>x</sup>* <sup>2</sup> *r* 4 xy , ∂<sup>2</sup>ϕ <sup>∂</sup>*<sup>x</sup>* <sup>2</sup> <sup>=</sup> <sup>∂</sup> ∂*x* − *y r* 2 xy <sup>=</sup> <sup>2</sup>*xy r* 3 xy , ∂<sup>2</sup>ϕ <sup>∂</sup>*y*<sup>2</sup> <sup>=</sup> <sup>∂</sup> ∂*y x r* 2 xy = −2*xy r* 3 xy , ∂ϕ <sup>∂</sup>*<sup>z</sup>* <sup>=</sup> <sup>0</sup>.

It only depends on *x* and *y*, and altogether, we obtain

$$
\Delta\_{\varphi} = \left[\frac{2\mathbf{x}\mathbf{y} - 2\mathbf{x}\mathbf{y}}{r\_{\mathbf{xy}}^3}\right] \frac{\partial}{\partial\varphi} + \left[\frac{\mathbf{x}^2 + \mathbf{y}^2}{r\_{\mathbf{xy}}^4}\right] \frac{\partial^2}{\partial\varphi^2} = \frac{1}{r\_{\mathbf{xy}}^2} \frac{\partial^2}{\partial\varphi^2} = \frac{1}{r^2(1 - \xi^2)} \frac{\partial^2}{\partial\varphi^2}.\tag{A.7}
$$

*2D*. In two dimensions, ζ = 0, therefore

$$
\Delta\_{\varphi,2D} = \frac{1}{r^2} \frac{\partial^2}{\partial \varphi^2}. \tag{A.8}
$$

#### *A.3.3 The Zenithal Part*

The zenith angle is actually ϑ, and we define ζ = cos ϑ as a variable to express it in order to simplify the derivation. With <sup>ζ</sup> <sup>=</sup> <sup>√</sup> *<sup>z</sup> <sup>x</sup>*2+*y*2+*z*<sup>2</sup> <sup>=</sup> *<sup>z</sup> <sup>r</sup>* , the zenithal part <sup>ζ</sup> <sup>=</sup> <sup>∂</sup>2<sup>ζ</sup> <sup>∂</sup>*x*<sup>2</sup> <sup>+</sup> ∂ζ ∂*x* <sup>2</sup> <sup>∂</sup> ∂ζ ∂ ∂ζ becomes

$$\begin{split} \left[\frac{\partial\xi}{\partial x}\right]^2 &= \left[\frac{\partial}{\partial x}\frac{z}{r}\right]^2 = \left[-\frac{z}{r^2}\frac{x}{r}\right]^2 = \left[-\frac{xz}{r^3}\right]^2 = \frac{x^2z^2}{r^6},\\ \left[\frac{\partial\xi}{\partial z}\right]^2 &= \left[\frac{\partial}{\partial z}\frac{z}{r}\right]^2 = \left[\frac{1}{r} - \frac{z}{r^2}\frac{z}{r}\right]^2 = \left[\frac{r^2 - z^2}{r^3}\right]^2 = \left[\frac{r\_{\text{xy}}^2}{r^3}\right]^2 = \frac{r\_{\text{xy}}^4}{r^6},\\ \frac{\partial^2\xi}{\partial x^2} &= \frac{\partial}{\partial x}\left[-\frac{xz}{r^3}\right] = -\frac{z}{r^3} + 3xz\frac{1}{r^4}\frac{x}{r} = z\frac{3x^2 - r^2}{r^5},\\ \frac{\partial^2\xi}{\partial z^2} &= \frac{\partial}{\partial z}\left[\frac{r\_{\text{xy}}^2}{r^3}\right] = -3\frac{r\_{\text{xy}}^2}{r^4}\frac{z}{r} = -z\frac{3r\_{\text{xy}}^2}{r^5}. \end{split}$$

For *x*, *y*, and *z* altogether, we get

$$\Delta\chi = z \frac{3r^2 + 3y^2 - 2r^2 - 3r\_{\rm xy}^2}{r^5} \frac{\partial}{\partial\xi} + \frac{(x^2 + y^2)z^2 + r\_{\rm xy}^4}{r^6} \frac{\partial^2}{\partial\xi^2} = -\frac{2r^2}{r^5} \frac{\partial}{\partial\xi} + \frac{r^2 r\_{\rm xy}^2}{r^6} \frac{\partial^2}{\partial\xi^2}$$

$$= -z \frac{2}{r^3} \frac{\partial}{\partial\xi} + \frac{r^2(1 - \xi^2)}{r^4} \frac{\partial^2}{\partial\xi^2} = -\frac{2}{r^2} \xi \frac{\partial}{\partial\xi} + \frac{1 - \xi^2}{r^2} \frac{\partial^2}{\partial\xi^2}. \tag{A.9}$$

*2D*. This part does not exist in 2D.

#### *A.3.4 Azimuthal Solution in 2D and 3D*

The azimuth harmonics are found by solving <sup>ϕ</sup> = −λ*r* <sup>2</sup> xy

$$\frac{\mathrm{d}^2}{\mathrm{d}\varphi^2}\Phi = -\lambda \, r\_{\mathrm{xy}}^2 \Phi. \tag{A.10}$$

We know that cos *x* = − cos *x* and sin *x* = − sin *x*, therefore we can insert the solutions

$$\hat{\Phi} = \begin{cases} \cos(a\varphi), & \text{for } a \ge 0, \\ \sin(|a|\varphi), & \text{for } a < 0, \end{cases}$$

and obtain with <sup>d</sup><sup>2</sup> <sup>d</sup>ϕ<sup>2</sup> ˆ = −*a*2ˆ the characteristic equation that fixes *a*

$$-a^2 = -\lambda \, r\_{\text{xy}}^2.$$

Geometrically, we desire that (ϕ) <sup>ˆ</sup> <sup>=</sup> (ϕ <sup>ˆ</sup> <sup>+</sup> <sup>2</sup><sup>π</sup> *<sup>l</sup>*) with *<sup>l</sup>* <sup>∈</sup> <sup>Z</sup>. This is only possible with λ *r* <sup>2</sup> xy <sup>=</sup> *<sup>m</sup>*2, and *<sup>m</sup>* <sup>∈</sup> <sup>Z</sup>.

We can therefore define for −∞ ≤ *m* ≤ ∞ the terms of a normalized Fourier series

$$\Phi\_m(\varphi) = \frac{1}{\sqrt{2\pi}} \begin{cases} \sqrt{2}\sin(|m|\varphi), & \text{for } m < 0, \\ 1, & \text{for } m = 0, \\ \sqrt{2}\cos(m\varphi), & \text{for } m > 0. \end{cases} \tag{A.11}$$

The azimuth harmonics are orthogonal: none of the products cos(*i*ϕ)sin(*j*ϕ), cos(*i*ϕ) cos(*j*ϕ), or sin(*i*ϕ)sin(*j*ϕ) produces a constant component unless *i* = *j*, excluding the mixed cosine and sine product. Normalization ensures that the nonzero result is unity

$$\int\_0^{2\pi} \Phi\_i \, \Phi\_j \, \mathrm{d}\varphi = \delta\_{ij} = \begin{cases} 1, & \text{for } i = j, \\ 0, & \text{else}, \end{cases} \tag{A.12}$$

$$\text{by } \int\_0^{2\pi} \frac{1}{\sqrt{2\pi}^2} \,\text{d}\varphi = \int\_0^{2\pi} \frac{\cos^2 m\varphi}{\sqrt{\pi}^2} \,\text{d}\varphi = \int\_0^{2\pi} \frac{\sin^2 m\varphi}{\sqrt{\pi}^2} \,\text{d}\varphi = 1.$$

*2D functions*. With unbounded |*m*|→∞, the circular harmonics are complete in the Hilbert space of square-integrable circular polynomials. By their orthonormality, we can derive a transformation integral of a function *g*(ϕ) that should be represented as series

$$\mathbf{g} = \sum\_{j=-\infty}^{\infty} \boldsymbol{\wp}\_j \,\boldsymbol{\Phi}\_j \tag{A.13}$$

by integration of *g* over *<sup>m</sup>* % dϕ

$$\int\_{-\pi}^{\pi} \mathbf{g}(\varphi) \, \Phi\_m \, \mathbf{d}\varphi = \sum\_{m=-\infty}^{\infty} \gamma\_m \underbrace{\int\_{-\pi}^{\pi} \Phi\_m \Phi\_j \, \mathbf{d}\varphi}\_{=\delta\_{n/2}} = \gamma\_m. \tag{A.14}$$

*2D panning functions*. To decompose an infinitely narrow unit-surface Dirac delta function that represents an infinite-order panning function towards the direction ϕs,

$$\delta(\varphi - \varphi\_s) = \begin{cases} \lim\_{\varepsilon \to 0} \frac{1}{2\varepsilon}, & \text{for } |\varphi - \varphi\_s| \le \varepsilon, \\ 0 & \text{otherwise}, \end{cases} \tag{A.15}$$

we the obtain the coefficients from the transformation integral

$$\begin{split} \gamma\_{m} &= \int\_{-\pi}^{\pi} \delta(\varphi - \varphi\_{\mathrm{s}}) \, \Phi\_{m} \, \mathrm{d}\varphi = \Phi\_{m}(\varphi\_{\mathrm{s}}) \lim\_{\varepsilon \to 0} \int\_{-\varepsilon}^{\varepsilon} \frac{1}{2\varepsilon} \mathrm{d}\varphi \\ &= \Phi\_{m}(\varphi\_{\mathrm{s}}). \end{split} \tag{A.16}$$

Typically, a finite-order series will be employed as a panning function

$$\log\_{\mathbf{N}} = \sum\_{m=-N}^{N} a\_m \Phi\_m(\varphi\_{\mathbf{s}}) \Phi\_m(\varphi),\tag{A.17}$$

involving a weight *am* = *a*|*m*<sup>|</sup> controlling the side lobes. To evaluate the *E* measure of loudness, we can write

$$E = \int\_{-\pi}^{\pi} \mathbf{g}\_N^2 \, \mathrm{d}\varphi = \int\_{-\pi}^{\pi} \left[ \sum\_{i=0}^N a\_i \, \Phi\_i \right] \left[ \sum\_{j=0}^N a\_j \, \Phi\_j \right] \mathrm{d}\varphi = \sum\_{i,j} a\_i \, a\_j \int\_{-\pi}^{\pi} \Phi\_i \, \Phi\_j \mathrm{d}\varphi$$

$$= \sum\_{i=0}^N \frac{2 - \delta\_i}{2\pi} \, a\_i^2. \tag{A.18}$$

For ϕ<sup>s</sup> = 0, we obtain an axisymmetric function in terms of a pure cosine series, as sin 0 = 0,

$$\log\_N(\varphi) = \sum\_{m=0}^N a\_m \frac{2 - \delta\_m}{2\pi} \cos(m\varphi),\tag{A.19}$$

with δ*<sup>m</sup>* = 1 for 1 for *m* = 0 and 0 elsewhere. The axisymmetric panning function is easier to design.

*2D max*-*r*E. For the narrowest-possible spread, we maximize the length of *r*E,

$$r\_{\rm E} = \frac{\int\_{-\pi}^{\pi} g\_{\rm N}^2 \cos \varphi \,\mathrm{d}\varphi}{\int\_{-\pi}^{\pi} g\_{\rm N}^2 \,\mathrm{d}\varphi} = \frac{\int \sum\_{i,j=0}^{N} a\_i a\_j (2 - \delta\_i)(2 - \delta\_j) \cos(i\varphi) \cos(j\varphi) \,\cos(\varphi) \,\mathrm{d}\varphi}{(2\pi)^2 E}$$

$$= \frac{\sum\_{i=1}^{N} a\_i a\_{i-1}}{\pi E} =: \frac{\hat{r}\_{\rm E}}{E} \tag{A.20}$$

where we used cos(*i*ϕ) cos(ϕ) <sup>=</sup> cos[(*i*+1)ϕ]+cos[(*i*−1)ϕ] <sup>2</sup>−δ*<sup>i</sup>* , inserted the orthogonality of the cosine % <sup>π</sup> −π (2−δ*i*) cos(*i*ϕ) cos(*j*ϕ) <sup>2</sup><sup>π</sup> dϕ = δ*i j* , and combined (*aiai*−<sup>1</sup> + *aiai*+<sup>1</sup>) = 2 *aiai*−1. To maximize, we zero the derivative to *am*

$$\begin{split} r'\_{\mathrm{E}} &= \frac{\hat{r}'\_{\mathrm{E}}}{E} - \frac{\hat{r}\_{\mathrm{E}}}{E^2} E' = \frac{1}{E} [\hat{r}'\_{\mathrm{E}} - E' \, r\_{\mathrm{E}}] = 0\\ &= a\_{m-1} + a\_{m+1} - (2 - \delta\_m) a\_m \, r\_{\mathrm{E}} = 0. \end{split}$$

If we assume that *am* <sup>=</sup> cos(*m*α), and we insert this for *am*+1+*am*−<sup>1</sup> <sup>2</sup>−δ*<sup>m</sup>* <sup>=</sup> *am <sup>r</sup>*E, we recognize by inserting the above theorem cos(*m*α) cos(α) <sup>=</sup> cos[(*m*+1)α]+cos[(*m*−1)α] 2−δ*<sup>m</sup>*

$$\frac{\cos[(m+1)\alpha] + \cos[(m-1)\alpha]}{2 - \delta\_m} = \cos(m\alpha) \, r\_\mathbb{E} = \cos(m\alpha)\cos(\alpha)\theta$$

that *r*<sup>E</sup> = cos α. And to maximize *r*<sup>E</sup> by constraining that *a*N+<sup>1</sup> = 0, we get the smallest-possible spread <sup>α</sup> = ±<sup>π</sup> 2 1 <sup>N</sup>+<sup>1</sup> = ± <sup>90</sup>◦ N+1

$$a\_m = \begin{cases} \cos\left(\frac{\pi}{2}\frac{m}{N+1}\right), & \text{for } 0 \le m \le \text{N},\\ 0, & \text{elsewhere.} \end{cases} \tag{A.21}$$

The max-*r*<sup>E</sup> panning function in 2D consequently is

$$\log\_{\mathbf{N}}(\boldsymbol{\varphi}) = \sum\_{m=-N}^{N} a\_{m} \, \Phi\_{m}(\varphi\_{\mathbf{s}}) \, \Phi\_{m}(\boldsymbol{\varphi}) = \sum\_{m=-N}^{N} \cos\left(\frac{\pi}{2} \frac{m}{N+1}\right) \, \Phi\_{m}(\varphi\_{\mathbf{s}}) \, \Phi\_{m}(\boldsymbol{\varphi}) . \tag{A.22}$$

#### *A.3.5 Towards Spherical Harmonics (3D)*

The spherical harmonics are harmonics depending only on angular terms. We may superimpose both parts ϕζ = ϕ + <sup>ζ</sup> of the Laplacian and solve the eigenproblem *r* <sup>2</sup> ϕζ *Y* = −λ*Y*

$$\frac{1}{1-\xi^2} \frac{\partial^2}{\partial \varphi^2} Y - 2\xi \frac{\partial}{\partial \xi} Y + (1-\xi^2) \frac{\partial^2}{\partial \xi^2} Y = -\lambda Y.$$

We assume *Y* to be a product of the azimuth harmonics *m*(ϕ) from above and undefined zenith harmonics (ζ )

$$Y = \Phi\_m \Theta,\tag{A.23}$$

which yields a differential equation (<sup>∂</sup> <sup>→</sup> d) only in <sup>ζ</sup> after inserting <sup>d</sup><sup>2</sup> d2ϕ *<sup>m</sup>* = −*m*<sup>2</sup>*<sup>m</sup>*

$$
\Theta \frac{-m^2}{1-\xi^2} \Phi\_m - 2\zeta \Phi\_m \frac{\mathbf{d}}{\mathbf{d}\zeta} \Theta + (1-\xi^2) \Phi\_m \frac{\mathbf{d}^2}{\mathbf{d}\zeta^2} \Theta = -\lambda \Phi\_m \Theta \cdot \mathbf{d}
$$

And after dividing by *m*, we obtain the *associated Legendre differential equation*

$$\begin{split} \frac{-m^2}{1-\xi^2} \Theta - 2\xi \frac{\mathbf{d}}{\mathbf{d}\xi} \Theta + (1-\xi^2) \frac{\mathbf{d}^2}{\mathbf{d}\xi^2} \Theta &= -\lambda \Theta, \\\left[ (1-\xi^2) \frac{\mathbf{d}^2}{\mathbf{d}\xi^2} - 2\xi \frac{\mathbf{d}}{\mathbf{d}\xi} + \lambda - \frac{m^2}{1-\xi^2} \right] \Theta &= 0. \end{split} \tag{A.24}$$

#### *A.3.6 Zenithal Solution: Associated Legendre Differential Equation*

The associated Legendre differential equation (written in *x* and *y* for mathematical simplicity) is

$$(1 - \mathbf{x}^2)\mathbf{y}'' - 2\mathbf{x}\mathbf{y}' + \left[\lambda - \frac{m^2}{1 - \mathbf{x}^2}\right]\mathbf{y} = \mathbf{0},$$

or after gathering the derivatives

$$\left[ (1 - x^2) \mathbf{y}' \right]' + \left[ \lambda - \frac{m^2}{1 - x^2} \right] \mathbf{y} = \mathbf{0}.$$

*Simplifying the differential equation by* <sup>1</sup> <sup>1</sup>−*x*<sup>2</sup> . In the associated Legendre differential equation, we would like to get rid of the denominator <sup>1</sup> <sup>1</sup>−*x*<sup>2</sup> . In this case, it is typical to substitute *y* = (1 − *x* <sup>2</sup>)<sup>α</sup>*v* and try out which α succeeds. For insertion into the differential equation, the derivative of *y* is

$$\mathbf{y}' = -a(1-\mathbf{x}^2)^{a-1}2\mathbf{x}\,\nu + (1-\mathbf{x}^2)^a \nu' = -2a(1-\mathbf{x}^2)^{a-1}\mathbf{x}\,\nu + (1-\mathbf{x}^2)^a \nu',$$

and the second-order derivative term is

$$\begin{split} [(1-\mathbf{x}^{2})\mathbf{y}']' &= \left[ -2a(1-\mathbf{x}^{2})^{a}\mathbf{x}\,\nu + (1-\mathbf{x}^{2})^{a+1}\,\nu' \right]' \\ &= 4a^{2}(1-\mathbf{x}^{2})^{a-1}\mathbf{x}^{2}\,\nu - 2a(1-\mathbf{x}^{2})^{a}\,\nu - 2a(1-\mathbf{x}^{2})^{a}\,\mathbf{x}\,\nu' \\ &\quad - 2(a+1)(1-\mathbf{x}^{2})^{a}\,\mathbf{x}\,\nu' + (1-\mathbf{x}^{2})^{a+1}\,\nu'' \\ &= (1-\mathbf{x}^{2})^{a}\left[ \frac{4a^{2}}{1-\mathbf{x}^{2}}\,\boldsymbol{x}^{2}\,\nu - 2a\,\nu - 2(2a+1)\,\mathbf{x}\,\nu' + (1-\mathbf{x}^{2})\,\nu'' \right]. \end{split}$$

Together with the term <sup>λ</sup> <sup>−</sup> *<sup>m</sup>*<sup>2</sup> 1−*x*<sup>2</sup> *y*, the associated Legendre differential equation becomes

$$(1-x^2)^{\alpha} \left[ \frac{4a^2}{1-x^2} \ge^2 \nu - 2a \,\,\nu + \left(\lambda - \frac{m^2}{1-x^2}\right) \nu - 2(2a+1) \ge \nu' + (1-x^2) \,\,\nu'' \right] = 0$$

$$-m^2 \frac{1-\frac{4a^2}{m^2}x^2}{1-x^2} \nu + (\lambda - 2a) \,\,\nu - 2(2a+1) \ge \nu' + (1-x^2) \,\,\nu'' = 0.$$

We see that the term <sup>1</sup> <sup>1</sup>−*x*<sup>2</sup> entirely cancels by <sup>α</sup> <sup>=</sup> *<sup>m</sup>* <sup>2</sup> , which fixes the substitution

$$\mathbf{y} = \sqrt{1 - \mathbf{x}^2} \text{ \textit{v.}}\tag{A.25}$$

Note that for rotational symmetric solutions around the Cartesian *z* coordinate, the choice of *m* = 0 would ensure a constant azimuthal part *<sup>m</sup>* = const. Reinserting *<sup>x</sup>* <sup>=</sup> <sup>ζ</sup> <sup>=</sup> cos <sup>ϑ</sup>, the preceding term <sup>√</sup><sup>1</sup> <sup>−</sup> cos2 <sup>ϑ</sup>*<sup>m</sup>* <sup>=</sup> sin*<sup>m</sup>* <sup>ϑ</sup> is understandably required to represent shapes that aren't rotationally symmetric around *z*, but any other, freely rotated axis, for which we also required the sinusoids in 2D. The differential equation for *v* = *v*(cos ϑ) is

$$(1 - \mathbf{x}^2)\,\nu' - \mathcal{D}(m+1)\ge\nu' + \left[\lambda - m(m+1)\right]\nu = 0.\tag{A.26}$$

Still, the above equation is singular at *x* ± 1, which means that the secondderivative term multiplied by (1 − *x* <sup>2</sup>) vanishes there, rendering the differential equation into a first-order differential equation, locally. Instead of the more comprehensive Frobenius method we keep it simple: Desired spherical polynomials

*Y <sup>m</sup> <sup>n</sup>* = *<sup>m</sup> <sup>m</sup> <sup>n</sup>* = P*n*(θx, θy, θz) with *m*(ϕ) ∝ √ 1 1−θ <sup>2</sup> z *m* θ<sup>x</sup> −θ<sup>y</sup> θ<sup>y</sup> θ<sup>x</sup> *m*−<sup>1</sup> θx θy <sup>=</sup> <sup>P</sup>*<sup>m</sup>* (θx,θy) <sup>√</sup>1−<sup>θ</sup> <sup>2</sup> z *m* imply that *<sup>m</sup> <sup>n</sup>* must contain 1 − θ <sup>2</sup> z *m* P*n*−*<sup>m</sup>*(θz) to be polynomial and *n*th-order: in condensed notation this is *<sup>y</sup>* <sup>=</sup> <sup>√</sup><sup>1</sup> <sup>−</sup> *<sup>x</sup>* <sup>2</sup>*<sup>m</sup> n*−*<sup>m</sup> <sup>k</sup>*=<sup>0</sup> *ak <sup>x</sup><sup>k</sup>* , see also [4].

*Power-series for v*. With *v* = <sup>∞</sup> *<sup>k</sup>*=<sup>0</sup> *ak <sup>x</sup><sup>k</sup>* , we get after inserting and deriving

$$\begin{aligned} \left(1 - x^2\right) \sum\_{k=2}^{\infty} k(k-1) \, a\_k \, \mathbf{x}^{k-2} - 2(m+1) \mathbf{x} \sum\_{k=1}^{\infty} k \, a\_k \, \mathbf{x}^{k-1} \\ &+ \left[\lambda - m(m+1)\right] \sum\_{k=0}^{\infty} a\_k \, \mathbf{x}^k = 0, \\ \sum\_{k=2}^{\infty} k(k-1) \, a\_k \, \mathbf{x}^{k-2} - \sum\_{k=2}^{\infty} k(k-1) \, a\_k \, \mathbf{x}^k - 2(m+1) \sum\_{k=1}^{\infty} k \, a\_k \, \mathbf{x}^k \\ &+ \left[\lambda - m(m+1)\right] \sum\_{k=0}^{\infty} a\_k \, \mathbf{x}^k = 0. \end{aligned}$$

For *k* ≥ 2, all sum terms are present and the comparison of coefficients for the *k*th power yields:

$$\begin{aligned} \left[ (k+1)(k+2) \, a\_{k+2} = \left[ k(k-1) + 2(m+1)k - \left[ \lambda - m(m+1) \right] \right] \right] a\_k \\ a\_{k+2} = \frac{k(k+2m+1) + m(m+1) - \lambda}{(k+1)(k+2)} a\_k. \end{aligned}$$

Typically for such a two-step recurrence, two starting conditions *a*<sup>0</sup> = 1, *a*<sup>1</sup> = 0 and *a*<sup>0</sup> = 0, *a*<sup>1</sup> = 1 yield a pair of linearly independent solutions (even and odd).

If the series in *x* should converge, it will most certainly do so when *v* is *polynomial* and stops at some order. To design *<sup>y</sup>* to be of some arbitrary finite order *<sup>n</sup>* <sup>∈</sup> <sup>Z</sup>, we take into account that <sup>√</sup><sup>1</sup> <sup>−</sup> *<sup>x</sup>* <sup>2</sup>*<sup>m</sup>* is of *<sup>m</sup>*th order already, so the polynomial *<sup>v</sup>* must be (*n* − *m*)th order, and |*m*| ≤ *n*. The series is forced to stop the coefficient *ak* for *k* = *n* − *m* if the numerator is forced to become zero by a suitably chosen λ, thus λ = (*n* − *m*)(*n* + *m* + 1) + *m*(*m* + 1) = *n*(*n* + 1). Corresponding to the termination either at an even or odd *k* = *n* − *m*, even *a*<sup>0</sup> = 1, *a*<sup>1</sup> = 0 or odd *a*<sup>0</sup> = 0, *a*<sup>1</sup> = 1 starting conditions must be chosen. The otherwise wrong-parity solution is an infinite series [5, Eq. 3.2.45] whose convergence radius *R* indicates singularities at *x* = ±1,

$$R = \lim\_{k \to \infty} \frac{a\_k}{a\_{k+2}} = \lim\_{k \to \infty} \frac{(k+1)(k+2)}{k(k+2m+1) - m(m+1) - n(n+1)} = \lim\_{k \to \infty} \frac{k^2 + \dotsb}{k^2 + \dotsb} = 1. \tag{A.27}$$

Using λ = *n*(*n* + 1) and writing the differentials in condensed form, the defining differential equations for associated Legendre functions *P<sup>m</sup> <sup>n</sup>* (*m* is no exponent but a second index) and their polynomial part *v<sup>m</sup> <sup>n</sup>* become

$$\frac{\mathbf{d}}{\mathbf{d}\mathbf{x}}\left[ (1-\boldsymbol{x}^{2}) \frac{\mathbf{d}}{\mathbf{d}\boldsymbol{x}} P\_{n}^{m} \right] + \left[ n(n+1) - \frac{m(m+1)}{1-\boldsymbol{x}^{2}} \right] P\_{n}^{m} = 0,\quad (\text{A.28})$$

$$\frac{1}{\mathbf{d}}(1-\mathbf{x}^2)^{-m}\frac{\mathbf{d}}{\mathbf{d}\mathbf{x}}\left[(1-\mathbf{x}^2)^{m+1}\frac{\mathbf{d}}{\mathbf{d}\mathbf{x}}\mathbf{v}\_n^m\right] + \left[n(n+1) - m(m+1)\right]\mathbf{v}\_n^m = 0. \quad (\text{A.29})$$

*Orthogonality of associated Legendre functions*. The resulting associated Legendre differential equation

$$\left[ (1 - \boldsymbol{x}^2) \left[ \boldsymbol{P}\_n^m \right]' \right]' + \left[ n(n+1) - \frac{m^2}{1 - \boldsymbol{x}^2} \right] \boldsymbol{P}\_n^m = 0,$$

yields a sequence of finite-order functions *P<sup>m</sup> <sup>n</sup>* with the order *<sup>n</sup>* <sup>∈</sup> <sup>N</sup><sup>0</sup> and <sup>|</sup>*m*| ≤ *<sup>n</sup>*. Before even defining these functions, we can prove their orthogonality % <sup>1</sup> <sup>−</sup><sup>1</sup> *<sup>P</sup><sup>m</sup> <sup>n</sup> P<sup>m</sup> l* d*x* = 0 for *n* = *l*. This means no product of a pair of associated Legendre functions of different indices *n* = *l* produces any constant part on *x* ∈ [−1; 1], and *P<sup>m</sup> <sup>n</sup>* and *P<sup>m</sup> l* do not contain shapes of the respective other function. This is important to uniquely decompose shapes and to define transformation integrals.We multiply the differential equation with *P<sup>m</sup> <sup>l</sup>* and integrate it over *x*

$$\int\_{-1}^{1} \left[ (1 - \mathbf{x}^2) \left[ P\_n^m \right]' \right]' P\_l^m \, \mathbf{d}x + \int\_{-1}^{1} \left[ n(n+1) - \frac{m^2}{1 - \mathbf{x}^2} \right] P\_n^m P\_l^m \, \mathbf{d}x = 0.1$$

Integration by parts of the first integral yields

$$\int\_{-1}^{1} \left[ (1 - \mathbf{x}^2) \left[ P\_n^m \right]' \right]' P\_l^m \, \mathbf{d}x = \underbrace{(1 - \mathbf{x}^2) \left[ P\_n^m \right]' P\_l^m \Big|\_{-1}^{1}}\_{=0} - \int\_{-1}^{1} (1 - \mathbf{x}^2) \left[ P\_n^m \right]' \left[ P\_l^m \right]' \, \mathbf{d}x,\tag{A.30}$$

where the vanishing part is because of (1 − *x* <sup>2</sup>) = 0 at the endpoints *x* = ±1 where [*P<sup>m</sup> n* ] and *P<sup>m</sup> <sup>l</sup>* are finite. We get

$$\int\_{-1}^{1} (1 - \chi^2) \left[ P\_n^m \right]^\prime \left[ P\_l^m \right]^\prime \mathrm{d}x = \int\_{-1}^{1} \left[ n(n+1) - \frac{m^2}{1 - \chi^2} \right] P\_n^m P\_l^m \, \mathrm{d}x \dots$$

We could have arrived at an alternative expression, with the only difference in *l*(*l* + 1) instead of *n*(*n* + 1),

$$\int\_{-1}^{1} (1 - \mathbf{x}^2) \left[ P\_l^m \right]^\prime \left[ P\_n^m \right]^\prime \mathbf{d}x = \int\_{-1}^{1} \left[ l(l+1) - \frac{m^2}{1 - \mathbf{x}^2} \right] P\_l^m P\_n^m \mathbf{d}x,$$

if we had started integrating the differential equation of *P<sup>m</sup> <sup>l</sup>* over *P<sup>m</sup> <sup>n</sup>* , instead. The difference of both equations is

$$
\left[n(n+1) - l(l+1)\right] \int\_{-1}^{1} P\_n^m P\_l^m \, \mathrm{d}x = 0,
$$

and the scalar in brackets only vanishes for *n* = *l*. For the equation to hold at other *<sup>n</sup>* = *<sup>l</sup>*, we conclude that the associated Legendre functions % <sup>1</sup> <sup>−</sup><sup>1</sup> *<sup>P</sup><sup>m</sup> <sup>n</sup> P<sup>m</sup> <sup>l</sup>* d*x* = 0 must be orthogonal. (Orthogonality needs not hold for different *m*, as *<sup>m</sup>* achieves this orthogonality in azimuth.)

*Solving for polynomial part of associated Legendre functions*. To solve the differential equation for the polynomial part *v<sup>m</sup> <sup>n</sup>* in a way to arrive at the elegant Rodrigues formula, we first play with a test function

$$
\mu\_n = (1 - x^2)^n, \quad \text{differential} \quad \mu\_n' = -2n \ge (1 - x^2)^{n-1} = -2n \left(1 - x^2\right)^{-1} \ge \mu\_n.
$$

We may write its derivative as differential equation

$$(1 - x^2)\mu\_n' + 2n \ge \mu\_n = 0$$

and derive it *<sup>l</sup>* times by the Leibniz rule ( *f g*)(*<sup>n</sup>*) <sup>=</sup> *<sup>n</sup> k*=0 *n k f* (*k*) *g*(*n*−*k*) for repeated differentiation of products, with the binomial coefficient *<sup>n</sup> k* <sup>=</sup> *<sup>n</sup>*! *<sup>k</sup>*!(*n*−*k*)! and *<sup>f</sup>* (*k*) <sup>=</sup> d*<sup>k</sup> f* <sup>d</sup>*xk* for simplicity. The few non-zero derivatives of *x* and (1 − *x* <sup>2</sup>) simplify differentiation, *x* = 1, [(1 − *x* <sup>2</sup>) = −2*x*] = −2,

$$\begin{aligned} \left( (1-x^2)u\_n^{(l+1)} + l \left( -2x \right) u\_n^{(l)} \right) + \frac{(l-1)l}{2} (-2) \left. u\_n^{(l-1)} \right|\_n + 2n \ge u\_n^{(l)} + 2n \left. l \right u\_n^{(l-1)} = 0, \\ \left( 1 - x^2 \right) u\_n^{(l+1)} - 2(l-n) \ge u\_n^{(l)} + l(2n - l + 1) \left. u\_n^{(l-1)} \right|\_n = 0. \end{aligned}$$

This equation matches (1 − *x* <sup>2</sup>) *v<sup>m</sup> <sup>n</sup>* − 2(*m* + 1) *x v<sup>m</sup> <sup>n</sup>* + [*n*(*n* + 1) − *m*(*m* + 1)] *vm <sup>n</sup>* = 0 by matching the coefficients *l* − *n* = *m* + 1, hence *l* = *m* + *n* + 1, which nicely implies *l*(2*n* − *l* + 1) = *n*(*n* + 1) − *m*(*m* + 1),

$$\left[\left(1-\mathbf{x}^{2}\right)\boldsymbol{\upmu}\_{n}^{(m+n+2)} - \boldsymbol{\upmu}(m+1)\right] \times \boldsymbol{\upmu}\_{n}^{(m+n+1)} + \left[n(n+1) - m(m+1)\right] \boldsymbol{\upmu}\_{n}^{(m+n)} = \boldsymbol{0}.$$

We therefore find the solutions *v<sup>m</sup> <sup>n</sup>* <sup>=</sup> *<sup>u</sup>*(*n*+*m*) *<sup>n</sup>* <sup>=</sup> <sup>d</sup>*n*+*<sup>m</sup>* <sup>d</sup>*xn*+*<sup>m</sup>* (1 − *x* <sup>2</sup>)*<sup>n</sup>* yielding *y<sup>m</sup> <sup>n</sup>* <sup>=</sup> <sup>√</sup><sup>1</sup> <sup>−</sup> *<sup>x</sup>* <sup>2</sup>*<sup>m</sup>* <sup>d</sup>*n*+*<sup>m</sup>* <sup>d</sup>*xn*+*<sup>m</sup>* (1 − *x* <sup>2</sup>)*<sup>n</sup>*.

*Rodrigues formula*. By of the above, the Rodrigues formula for the associated Legendre functions *P<sup>m</sup> <sup>n</sup>* becomes

$$P\_n^m = \frac{(-1)^{n+m}}{2^n n!} \sqrt{1 - \mathbf{x}^2} \underbrace{\mathbf{d}^{n+m}}\_{\text{....}} (1 - \mathbf{x}^2)^n \tag{A.31}$$

$$\text{For } \quad P\_n^m = (-1)^m \sqrt{1 - \mathbf{x}^2}^m \frac{\mathbf{d}^m}{\mathbf{d} \mathbf{x}^m} P\_n, \text{ } \quad \text{with} \quad P\_n = \frac{(-1)^n}{2^n n!} \frac{\mathbf{d}^n}{\mathbf{d} \mathbf{x}^n} (1 - \mathbf{x}^2)^n.$$

and *Pn* = *P*<sup>0</sup> *<sup>n</sup>* are the Legendre polynomials. The Legendre polynomials are normalized to *Pn*(1) <sup>=</sup> 1 by the factor (−1)*<sup>n</sup>* <sup>2</sup>*<sup>n</sup> <sup>n</sup>*! . Because (<sup>1</sup> <sup>−</sup> *<sup>x</sup>* <sup>2</sup>) is zero at *<sup>x</sup>* <sup>=</sup> 1 with any positive integer exponent, only the part of its *n*-fold derivative that exclusively affects the power of (1 − *x* <sup>2</sup>)*<sup>n</sup>* for *n* times is responsible for its value there: *n*!(−2*x*)*<sup>n</sup>*(1 − *x* <sup>2</sup>)<sup>0</sup>|*<sup>x</sup>*=<sup>1</sup> = *n*!2*<sup>n</sup>*(−1)*<sup>n</sup>*. The scaling of the associated Legendre functions with *m* > 0 is somewhat more arbitrary in sign and value.

*Indies <sup>n</sup> and <sup>m</sup>*. The boundaries for the index *<sup>m</sup>* of the Legendre functions *<sup>m</sup>* <sup>∈</sup> <sup>Z</sup> are typically <sup>−</sup>*<sup>n</sup>* <sup>≤</sup> *<sup>m</sup>* <sup>≤</sup> *<sup>m</sup>*, however due to the shift of the eigenvalue by *<sup>m</sup>*<sup>2</sup> <sup>1</sup>−*x*<sup>2</sup> , functions for positive and negative *m* are linearly dependent. We observe this by inspecting the highest-order terms in [6]

$$\begin{split} 2^n n! \sqrt{1 - \mathbf{x}^2}^m P\_n^m &= (-1)^{n+m} (1 - \mathbf{x}^2)^m \frac{\mathbf{d}^{n+m} (1 - \mathbf{x}^2)^n}{\mathbf{d} \mathbf{x}^{n+m}} \\ &= \mathbf{x}^{2m} \frac{\mathbf{d}^{n+m}}{\mathbf{d} \mathbf{x}^{n+m}} \bigg[ \mathbf{x}^{2n} - \dotsb \bigg] = \mathbf{x}^{2m} \left[ \frac{(2n)!}{(n-m)!} \mathbf{x}^{n-m} - \dotsb \right] \\ 2^n n! \sqrt{1 - \mathbf{x}^2}^m P\_n^{-m} &= (-1)^{n+m} \frac{\mathbf{d}^{n-m} (1 - \mathbf{x}^2)^n}{\mathbf{d} \mathbf{x}^{n-m}} \end{split}$$

$$=(-1)^{m} \frac{\mathbf{d}^{n-m}}{\mathbf{d}x^{n-m}} \Big[ \mathbf{x}^{2n} - \dots \Big] = (-1)^{m} \left[ \frac{(2n)!}{(n+m)!} \mathbf{x}^{n+m} - \dots \right]$$

$$\implies P\_n^{-m} = (-1)^m \frac{(n-m)!}{(n+m)!} P\_n^m \tag{A.32}$$

and to avoid confusion, it convenient to only use *m* ≥ 0, or |*m*| to evaluate the associated Legendre functions.

*Alternative definition: three-term recurrence*. Any polynomial P*<sup>n</sup>* of the order *n* can be decomposed into Legendre polynomials <sup>P</sup>*<sup>n</sup>* <sup>=</sup> *<sup>n</sup> <sup>i</sup>*=<sup>0</sup> *ci Pi* , and the Legendre polynomial *Pj* is orthogonal to all those Legendre polynomials % <sup>1</sup> <sup>−</sup><sup>1</sup> <sup>P</sup>*<sup>n</sup> Pj*d*<sup>x</sup>* <sup>=</sup> 0 if *j* > *n*. With this knowledge it is interesting to describe % <sup>1</sup> −<sup>1</sup>(*x Pi*)*Pj*d*x*. As (*x Pi*) is of (*i* + 1)th order, the integral must vanish for *j* > *i* + 1. Because of commutativity, % 1 <sup>−</sup><sup>1</sup> *Pi*(*x Pj*)d*x*, and (*x Pj*) being (*<sup>j</sup>* <sup>+</sup> <sup>1</sup>)th order, it also vanishes for *<sup>i</sup>* <sup>&</sup>gt; *<sup>j</sup>* <sup>+</sup> 1. Hereby, re-expansion of *x Pn* can maximally use three terms, *x Pn* = α *Pn*−<sup>1</sup> + γ *Pn* + β *Pn*+1. In fact only two terms remain as *P*2*<sup>k</sup>* are even functions on *x* ∈ [−1; 1] and *P*2*k*+<sup>1</sup> are odd, thus orthogonal. The product *x Pn* changes the parity of *Pn*, leaving *x Pn* = α *Pn*−<sup>1</sup> + β *Pn*+1. At *x* = 1 all polynomials were normalized to *Pi*(1) = 1, therefore evaluation at *x* = 1 leaves 1 = α + β, so α = 1 − β, hence

$$\text{ax } P\_n = \beta\_n \, P\_{n+1} + (1 - \beta\_n) \, P\_{n-1} \dots$$

As also the associated Legendre functions *P<sup>m</sup> <sup>n</sup>* for a specific *m* are orthogonal, the recurrence is more general

$$\propto P\_n^m = \beta\_n^m \, P\_{n+1}^m + (1 - \beta\_n^m) \, P\_{n-1}^m.$$

To determine the coefficient β*<sup>m</sup> <sup>n</sup>* , we only need to find out how the highest-power coefficients *x <sup>n</sup>*−*m*+<sup>1</sup> of the polynomial parts in *x P<sup>m</sup> <sup>n</sup>* and *P<sup>m</sup> <sup>n</sup>*+<sup>1</sup> are related. We see this after inserting *Pn* <sup>=</sup> <sup>√</sup><sup>1</sup> <sup>−</sup> *<sup>x</sup>* <sup>2</sup>*<sup>m</sup>* (−1)*n*+*<sup>m</sup>* 2*<sup>n</sup> n*! d*n*+*<sup>m</sup>* <sup>d</sup>*xn*+*<sup>m</sup>* (<sup>1</sup> <sup>−</sup> *<sup>x</sup>* <sup>2</sup>)*<sup>n</sup>* and division by <sup>√</sup><sup>1</sup> <sup>−</sup> *<sup>x</sup>* <sup>2</sup>*<sup>m</sup>* (−1)*n*+*<sup>m</sup>* <sup>2</sup>*<sup>n</sup> <sup>n</sup>*! , which leaves a recurrence for the polynomial part

$$\underbrace{\left.\nu\right\prime\_{n}^{m}}\_{O=n-m+1} = -\underbrace{\frac{\beta\_{n}^{m}}{2(n+1)}}\_{O=n-m+1}\nu\_{n+1}^{m} - \underbrace{2(n+1)(1-\beta\_{n}^{m})\nu\_{n-1}^{m}}\_{O=n-m-1}.$$

Of the highest powers *x <sup>n</sup>*−*m*+<sup>1</sup> in both *x v<sup>m</sup> <sup>n</sup>* and *v<sup>m</sup> <sup>n</sup>*+<sup>1</sup> the coefficients *<sup>c</sup><sup>m</sup> <sup>n</sup>*,*n*−*<sup>m</sup>* and *cm <sup>n</sup>*+1,*n*−*m*+<sup>1</sup> define

$$
\beta\_n^m = -2(n+1)\frac{c\_{n,n-m}^m}{c\_{n+1,n-m+1}^m}.
$$

To find it, we binomially expand (<sup>1</sup> <sup>−</sup> *<sup>x</sup>* <sup>2</sup>)*<sup>n</sup>* to (<sup>1</sup> <sup>−</sup> *<sup>x</sup>* <sup>2</sup>)*<sup>n</sup>* <sup>=</sup> (−1)*<sup>n</sup> <sup>n</sup> <sup>k</sup>*=<sup>0</sup>(−1)*<sup>k</sup> n k x* <sup>2</sup>(*n*−*k*)

$$\frac{\mathbf{v}\_n^m}{(-1)^n} = \frac{\mathbf{d}^{n+m}}{\mathbf{d}x^{n+m}} \sum\_{k=0}^n \binom{n}{k} (-1)^k x^{2(n-k)} = \sum\_{k=0}^{\lfloor \frac{n-m}{2} \rfloor} \binom{n}{k} \frac{(2n-2k)! (-1)^k}{(n-m-2k)!} x^{n-m-2k},$$

so that with *k* = 0 we can find *c<sup>m</sup> <sup>n</sup>*,*n*−*<sup>m</sup>* <sup>=</sup> (−1)*<sup>n</sup> <sup>n</sup>*! *n*! (2*n*)! (*n*−*m*)! <sup>=</sup> (−1)*<sup>n</sup>* (2*n*)! (*n*−*m*)! for the highestpower coefficient of *v<sup>m</sup> <sup>n</sup>* . Accordingly, coefficient of the recurrence is

$$
\beta\_n^m = 2(n+1)\frac{(n-m+1)}{(2n+1)(2n+2)} = \frac{n-m+1}{2n+1},
$$

hence with 1 − β*<sup>m</sup> <sup>n</sup>* <sup>=</sup> *<sup>n</sup>*+*<sup>m</sup>* <sup>2</sup>*n*+<sup>1</sup> , and *x P<sup>m</sup> <sup>n</sup>* <sup>=</sup> *<sup>n</sup>*−*m*+<sup>1</sup> <sup>2</sup>*n*+<sup>1</sup> *<sup>P</sup><sup>m</sup> <sup>n</sup>*+<sup>1</sup> <sup>+</sup> *<sup>n</sup>*+*<sup>m</sup>* <sup>2</sup>*n*+<sup>1</sup> *<sup>P</sup><sup>m</sup> <sup>n</sup>*−1, we can construct *P<sup>m</sup> <sup>n</sup>* recursively by

$$P\_{n+1}^{m} = \frac{2n+1}{n-m+1} \ge P\_n^m - \frac{n+m}{n-m+1} \ge P\_{n-1}^m. \tag{A.33}$$

The start value is *P<sup>n</sup> <sup>n</sup>* <sup>=</sup> (−1)2*<sup>n</sup>* 2*<sup>n</sup> n*! <sup>√</sup><sup>1</sup> <sup>−</sup> *<sup>x</sup>* <sup>2</sup>*<sup>n</sup>* <sup>d</sup>2*<sup>n</sup>* <sup>d</sup>*x*2*<sup>n</sup>* (<sup>1</sup> <sup>−</sup> *<sup>x</sup>* <sup>2</sup>)*<sup>n</sup>* <sup>=</sup> (−1)*<sup>n</sup>* (2*n*)! 2*<sup>n</sup> n*! <sup>√</sup><sup>1</sup> <sup>−</sup> *<sup>x</sup>* <sup>2</sup>*<sup>n</sup>* , and for *<sup>n</sup>* <sup>=</sup> *<sup>m</sup>*, the term *<sup>P</sup><sup>m</sup> <sup>n</sup>*−<sup>1</sup> is excluded.

*Normalization*. A unity square integral (orthonormalization) simplifies the definition of transform integrals. We would like to obtain the corresponding factor *N <sup>m</sup> <sup>n</sup>* with

$$\int\_{-1}^{1} (P\_n^m N\_n^m)^2 \,\mathrm{d}x = 1.$$

Normalization for *m* = 0 is easy to find by repeated integration by parts

$$\frac{2^{2n}n!^2}{(N\_n)^2} = 2^{2n}n!^2 \int\_{-1}^1 (P\_n)^2 \mathrm{d}x = \underbrace{\left[ (1-x^2)^n \right]^{(n-1)} \left[ (1-x^2)^n \right]^{(n)} \Big|\_{-1}^1}\_{=0}$$

$$-\int\_{-1}^1 \left[ (1-x^2)^n \right]^{(n-1)} \left[ (1-x^2)^n \right]^{(n)} \mathrm{d}x$$

$$= \dots = (-1)^n \int\_{-1}^1 (1-x^2)^n \left[ (1-x^2)^n \right]^{(2n)} \mathrm{d}x$$

$$= \int\_{-1}^1 (1-x^2)^n \left( 2n \right)! \mathrm{d}x = (2n)! \int\_0^\pi \sin^{2n} \vartheta \cdot \sin \vartheta \, \mathrm{d}\vartheta.$$

With the integral % <sup>π</sup> <sup>0</sup> sin<sup>2</sup>*n*+<sup>1</sup> <sup>ϑ</sup> <sup>d</sup><sup>ϑ</sup> <sup>=</sup> <sup>2</sup> (2*n*)!! (2*n*+1)!! <sup>=</sup> <sup>2</sup> <sup>2</sup>2*<sup>n</sup> <sup>n</sup>*! 2 (2*n*+1)! , this is (*Nn*)−<sup>2</sup> <sup>=</sup> <sup>2</sup> (2*n*)! (2*n*+1)! = 2 <sup>2</sup>*n*+<sup>1</sup> . For *<sup>N</sup> <sup>m</sup> <sup>n</sup>* , a trick to insert the relation between *P<sup>m</sup> <sup>n</sup>* and *P*−*<sup>m</sup> <sup>n</sup>* is used [6], and integration by parts until the differentials are of the same order

$$\frac{1}{(N\_n^m)^2} = \int\_{-1}^1 P\_n^m P\_n^m \, \mathrm{d}x = \int\_{-1}^1 P\_n^m \frac{(-1)^m (n-m)!}{(n+m)!} P\_n^{-m} \, \mathrm{d}x$$

$$\begin{split} &= \frac{1}{2^{2n}n!^{2}} \frac{(-1)^{m}(n-m)!}{(n+m)!} \int\_{-1}^{1} [(1-x^{2})]^{(n+m)} \left[ (1-x^{2}) \right]^{(n-m)} \, \mathrm{d}x = \cdots \\ &= \frac{(n-m)!}{(n+m)!} \underbrace{\int\_{-1}^{1} \frac{1}{2^{2n}n!^{2}} [(1-x^{2})]^{(n-m)} [(1-x^{2})]^{(n-m)} \, \mathrm{d}x}\_{=1/(N\_{0})^{2}} \, \mathrm{d}x = \frac{2}{2n+1} \frac{(n+m)!}{(n-m)!} \, \mathrm{d}x \\ &\to N\_{n}^{m} = (-1)^{m} \sqrt{\frac{(2n+1)}{2} \frac{(n-m)!}{(n+m)!}} \end{split} \tag{A.34}$$

The (−1)*<sup>m</sup>* can be excluded if not used in the Rodrigues formula. (*It is always a wise idea to check and compare signs as conventions may differ* …*in practice* (−1)*<sup>m</sup> is a rotation around z by* 180◦.)

#### *A.3.7 Spherical Harmonics*

With all the above definitions, we obtain the fully normalized spherical harmonics

$$Y\_n^m(\varphi, \vartheta) = N\_n^{|m|} \ P\_n^{|m|}(\cos \vartheta) \,\,\Phi\_m(\varphi) \tag{A.35}$$

*Orthonormality*. They are *orthonormal* when integrated over the sphere

$$\int\_{\mathbb{S}^2} Y\_n^m Y\_{n'}^{m'} \, \mathbf{d} \cos \theta \, \mathbf{d} \varphi = \delta\_{nn'} \delta\_{mm'}.\tag{A.36}$$

*Transform integral*. Because of their completeness in the Hilbert space, any squareintegrable function *g*(ϕ, ϑ) can be decomposed by

$$\log(\varphi,\vartheta) = \sum\_{n=0}^{\infty} \sum\_{m'=-n'}^{n'} \mathcal{V}\_{n'm'} Y\_{n'}^{m'}(\varphi,\vartheta). \tag{A.37}$$

From a known function *g*(ϕ, ϑ), the coefficients are obtained by integrating *g* with another spherical harmonic *Y <sup>m</sup> <sup>n</sup>* over the unit sphere S2, % 1 <sup>−</sup><sup>1</sup> d cos <sup>ϑ</sup> % <sup>2</sup><sup>π</sup> <sup>0</sup> dϕ. For a simple notation, we gather the two variables in a direction vector *θ* = [cos ϕ sin ϑ, sin ϕ sin ϑ, cos ϑ] <sup>T</sup> and write

$$\int\_{\mathbb{S}^2} g(\theta) \, Y\_n^m \, \mathrm{d}\theta = \sum\_{n'=0}^{\infty} \sum\_{m'=-n'}^{n'} \gamma\_{n'm'} \underbrace{\int\_{\mathbb{S}^2} Y\_{n'}^{m'} Y\_n^m \, \mathrm{d}\theta}\_{\delta\_{m'}\delta\_{nm'}} = \gamma\_{nm}. \tag{A.38}$$

*Parseval's theorem*. Due to orthonormality, the integral norm of any pattern *g*(*θ*) composed as <sup>∞</sup> *n*=0 *<sup>n</sup> <sup>m</sup>*=−*<sup>n</sup>* <sup>γ</sup>*nm <sup>Y</sup> <sup>m</sup> <sup>n</sup>* (*θ*) is equivalent to

$$\int\_{\mathbb{S}^2} |g(\theta)|^2 \,\mathrm{d}\theta = \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} |\chi\_{nm}|^2 \tag{A.39}$$

because % S2 *n*,*n* ,*m*,*m* γ*nm*γ <sup>∗</sup> *n <sup>m</sup> Y <sup>m</sup> <sup>n</sup>* (*θ*) *Y <sup>m</sup> <sup>n</sup>* (*θ*) d*θ* = *n*,*n* ,*m*,*m* γ*nm*γ <sup>∗</sup> *n <sup>m</sup>*δ*nn*δ*mm* .

*3D panning functions: Dirac delta on the sphere*. An infinitely narrow range around the desired direction *θ*<sup>s</sup> can be described by limiting the dot product *θ*<sup>T</sup> <sup>s</sup> *θ* > cos ε → 1. A unit-surface Dirac delta distribution δ(<sup>1</sup> <sup>−</sup> *<sup>θ</sup>*<sup>T</sup> <sup>s</sup> *θ*) can be described as

$$\delta(1-\theta\_s^{\mathrm{T}}\theta) = \frac{1}{2\pi} \begin{cases} \lim\_{\varepsilon \to 0} \frac{1}{1-\cos\varepsilon}, & \text{for } \arccos\theta\_s^{\mathrm{T}}\theta < \varepsilon\\ 0, & \text{otherwise.} \end{cases} \tag{A.40}$$

And its coefficients are found by the transformation integral

$$\gamma\_{nm} = \int\_{\mathbb{S}^2} Y\_n^m(\theta) \, \mathrm{d}\theta = Y\_n^m(\theta\_s) \int\_0^{2\pi} \, \mathrm{d}\varphi \lim\_{\varepsilon \to 0} \int\_{\cos \varepsilon}^1 \mathrm{d}\xi \, \, = Y\_n^m(\theta\_s). \tag{A.41}$$

Typically, a finite-order panning function with *n* ≤ N employs a weight *an* to reduce side lobes

$$\text{g}\_{\text{N}}(\boldsymbol{\theta}) = \sum\_{n=0}^{\text{N}} \sum\_{m=-n}^{n} a\_{n} \, Y\_{n}^{m}(\boldsymbol{\theta}\_{\text{s}}) \, Y\_{n}^{m}(\boldsymbol{\theta}). \tag{A.42}$$

Assuming the panning direction is *θ*<sup>s</sup> = [0, 0, 1] T, we get the axisymmetric panning function, with *Y* <sup>0</sup> *<sup>n</sup>* = (2*n*+<sup>1</sup> <sup>4</sup><sup>π</sup> *Pn*,

$$g\_{\mathcal{N}}(\vartheta) = \frac{1}{2\pi} \sum\_{n=0}^{\mathcal{N}} \frac{2n+1}{2} \, a\_n \, P\_n(\cos \vartheta). \tag{A.43}$$

We can evaluate its *E* measure by integrating *g*<sup>2</sup> <sup>N</sup> over the sphere

$$E = \underbrace{\int\_0^{2\pi} \mathrm{d}\varphi}\_{2\pi} \underbrace{\int\_{-1}^1 \mathrm{g}\_{\mathrm{N}}(\xi)^2 \,\mathrm{d}\xi}\_{2\pi} = \sum\_{i,j} \frac{2i+1}{2} \frac{2j+1}{2} \, a\_i \, a\_j \underbrace{\int\_{-1}^1 P\_i \, P\_j \, \mathrm{d}\xi}\_{\delta / \frac{2}{2i+1}}$$

$$= \sum\_{n=0}^{\mathrm{N}} \frac{2n+1}{2} a\_n^2. \tag{A.44}$$

The *r*<sup>E</sup> measure is, because of the axisymmetry, perfectly aligned with *z*, therefore, its length is calculated by

$$r\_{\mathcal{E}} = \frac{\int\_{-1}^{1} g\_{\mathcal{N}}(\xi) \, \xi \, \mathrm{d\xi}}{E} = \frac{\sum\_{i,j} \frac{2i+1}{2} \frac{2j+1}{2} \, a\_i \, a\_j \, \int\_{-1}^{1} P\_i \, \overbrace{\xi \, \, P\_j}^{\mathcal{N}} \, \mathrm{d\xi}}{E}$$

$$= \frac{\sum\_{i,j} \frac{2i+1}{4} \, a\_i \, a\_j \, \int\_{-1}^{1} P\_i \, [(j+1)P\_{j+1} + jP\_{j-1}] \mathrm{d\xi}}{E}$$

$$= \frac{\sum\_{n=0}^{\mathcal{N}} [n \, a\_n \, a\_{n-1} + (n+1) \, a\_n \, a\_{n+1}]}{2E} = \frac{\sum\_{n=1}^{\mathcal{N}} n \, a\_n \, a\_{n-1}}{E}. \tag{A.45}$$

*3D max-r*E. For the narrowest-possible spread, we maximize *r*E, which we decompose into *<sup>r</sup>*<sup>E</sup> <sup>=</sup> *<sup>r</sup>*ˆ<sup>E</sup> *<sup>E</sup>* and we zero its derivative, as for 2D,

$$r\_{\rm E}^{\prime} = \frac{\hat{r}\_{\rm E}^{\prime}}{E} - \frac{\hat{r}\_{\rm E}}{E^2} E^{\prime} = \frac{1}{E} [\hat{r}\_{\rm E}^{\prime} - E^{\prime} r\_{\rm E}] = 0$$

$$\ln a\_{n-1} + (n+1) \, a\_{n+1} - (2n+1) \, a\_n \, r\_{\rm E} = 0. \tag{A.46}$$

If we assume that *an* = *Pn*(ζ ), we see by (*n* + 1)*Pn*+<sup>1</sup> + *n Pn*−<sup>1</sup> = (2*n* + 1) *Pn* ζ

$$\left( \left( n+1 \right) \right) P\_{n+1} + n \left. P\_{n-1} = \left( 2n+1 \right) \right) P\_n \zeta = P\_n \, r\_{\mathcal{E}} \tag{A.47}$$

that *r*<sup>E</sup> = ζ and *an* = *Pn*(ζ ) = *Pn*(*r*E). We maximize *r*<sup>E</sup> under the constraint that *P*N+<sup>1</sup>(*r*E) = 0. Therefore, *r*<sup>E</sup> must be as close to 1 as possible, and be a zero of the Legendre polynomial *P*N+1. It can be discovered by a root-finding algorithm in MATLAB, e.g. Newton–Raphson, when the function *Pn* is implemented. In [7], the useful approximation *<sup>r</sup>*<sup>E</sup> <sup>=</sup> cos <sup>2</sup>.<sup>4062</sup> <sup>N</sup>+1.<sup>51</sup> <sup>=</sup> cos <sup>137</sup>.9◦ <sup>N</sup>+1.<sup>51</sup> was given.

*Squared norm mirror/rotation invariance.* The norm of any pattern *a*(*θ*)is invariant under orthogonal coordinate transform (rotation/mirror) **ˆ** *<sup>θ</sup>* <sup>=</sup> *<sup>R</sup> <sup>θ</sup>* with *<sup>R</sup>*<sup>T</sup>*<sup>R</sup>* <sup>=</sup> *<sup>I</sup>*,

$$\int\_{\mathbb{S}^2} a^2(\theta) \,\mathrm{d}\theta = \int\_{\mathbb{S}^2} b^2(\theta) \,\mathrm{d}\theta,\qquad\text{with } b(\theta) = a(\mathcal{R}\,\theta). \tag{A.48}$$

The norm equivalence of the corresponding spherical harmonics coefficients α*nm* and β*nm* follows from Parseval's theorem

$$\sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} |\alpha\_{nm}|^2 = \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} |\beta\_{nm}|^2. \tag{A.49}$$

In vector notation of the coefficients, i.e. *α* = [α00,...,αNN] <sup>T</sup> and *β* = [β00,..., βNN] T, this is *α*<sup>2</sup> = *β*2. To fulfill this equivalence, both vectors are related by an orthogonal matrix *<sup>β</sup>* <sup>=</sup> *<sup>Q</sup> <sup>α</sup>* with *<sup>Q</sup>*<sup>T</sup> *<sup>Q</sup>* <sup>=</sup> *<sup>I</sup>*, and hence *b*<sup>2</sup> <sup>=</sup> *<sup>β</sup>*<sup>T</sup>*<sup>β</sup>* <sup>=</sup> *<sup>α</sup>*<sup>T</sup> *<sup>Q</sup>*<sup>T</sup> *<sup>Q</sup><sup>α</sup>* <sup>=</sup> *α*T*α* = *α*2. Moreover, rotation/mirroring neither creates components of higher nor lower orders, so that *Q* must be block structured

$$\mathcal{Q} = \begin{bmatrix} \mathcal{Q}\_0 & \mathbf{0} & \dots & \mathbf{0} \\ \mathbf{0} & \mathcal{Q}\_1 & \mathbf{0} & \vdots \\ \vdots & \ddots & \ddots & \ddots \end{bmatrix} . \tag{A.50}$$

The order subspaces in *n* therefore stay de-coupled, so that the coefficient vectors for every order-subspace *n* are related and norm equivalent under mirror/rotation operations

$$\|\mathfrak{a}\_n\|^2 = \|\mathfrak{B}\_n\|^2, \qquad \mathfrak{B}\_n = \mathfrak{Q}\_n \mathfrak{a}\_n, \qquad \mathfrak{Q}\_n^\mathrm{T} \mathfrak{Q}\_n = I\_{2n+1}. \tag{A.51}$$

*Pseudo all-pass character of the Dirac delta*. Dirac delta distributions δ(*θ*<sup>T</sup>*θ*<sup>s</sup> <sup>−</sup> <sup>1</sup>) yield the coefficients *Y <sup>m</sup> <sup>n</sup>* (*θ*s), and due to rotation invariance they yield constant energy in every spherical harmonic order *n*, regardless of the aiming *θ*s. One can determine the norm for zenithal aiming <sup>ϑ</sup> <sup>=</sup> 0, i.e. *<sup>θ</sup>* <sup>z</sup> <sup>=</sup> <sup>0</sup>, <sup>0</sup>, <sup>1</sup>T, yielding a non-zero coefficient for *m* = 0

$$Y\_n^m(\theta\_z) = \sqrt{\frac{2n+1}{4\pi}} \widehat{P\_n(1)} \delta\_m = \sqrt{\frac{2n+1}{4\pi}} \delta\_m.$$

Because of the rotation invariance we recognize a pseudo-allpass character (Unsöld theorem) of the spherical harmonics of any order *n*

$$\sum\_{m=-n}^{n} |Y\_n^m(\theta\_s)|^2 = \sum\_{m=-n}^{n} |Y\_n^m(\theta\_x)|^2 = \frac{2n+1}{4\pi} = (2n+1) \ |Y\_0^0(\theta\_s)|^2.$$

For encoded single-direction Ambisonic signals α*nm*(*t*), this implies

$$\sum\_{m=-n}^{n} \left| \alpha\_{nm}(t) \right|^2 = (2n+1) \left| \alpha\_{00}(t) \right|^2. \tag{A.52}$$

*Expected norm in the diffuse field*. An ideal diffuse sound field is composed of directional signals *a*(*θ*s, *t*) from all directions *θ*, with no correlation for signals from different directions E{*a*(*θ* <sup>1</sup>, *t*) *a*(*θ* <sup>2</sup>, *t*)} = σ<sup>2</sup> <sup>a</sup> δ(*θ*<sup>T</sup> <sup>1</sup> *θ* <sup>2</sup> − 1). Its coefficients are obtained by the integral over the directions

$$\alpha\_{nm}(t) = \int\_{\mathbb{S}^2} a(\theta\_s, t) \, Y\_n^m(\theta\_s) \, \mathrm{d}\theta\_s,\tag{A.53}$$

and we can show that not only the expected directional signals, but also the spherical harmonic coefficients are orthogonal by E{*a*(*θ* <sup>1</sup>, *t*) *a*(*θ* <sup>2</sup>, *t*)} = σ<sup>2</sup> <sup>a</sup> δ(*θ*<sup>T</sup> <sup>1</sup> *θ* <sup>2</sup> − 1) and orthonormality % <sup>S</sup><sup>2</sup> *Y <sup>m</sup> <sup>n</sup> Y <sup>m</sup> <sup>n</sup>* d*θ* = δ*nn*δ*mm* of the spherical harmonics

$$\begin{split} \mathcal{E}\{\boldsymbol{\alpha}\_{nm}(t)\,\boldsymbol{\alpha}\_{n'm'}(t)\} &= \int\_{\mathbb{S}^2} \int\_{\mathbb{S}^2} \mathcal{E}\{\boldsymbol{a}(\theta\_1,t)\,\boldsymbol{a}(\theta\_2,t)\} \, Y\_n^m(\theta\_1) \, Y\_{n'}^{m'}(\theta\_2) \, \mathrm{d}\theta\_1 \mathrm{d}\theta\_2 \\ &= \sigma\_{\texttt{a}}^2 \int\_{\mathbb{S}^2} Y\_n^m(\theta) \, Y\_{n'}^{m'}(\theta) \, \mathrm{d}\theta = \sigma\_{\texttt{a}}^2 \delta\_{nn'} \delta\_{mm'}. \end{split} \tag{A.54}$$

In a perfectly diffuse field, we therefore expect the same norm in every spherical harmonic component per frequency band. We could reformulate this to

$$\mathcal{E}\{|\alpha\_{nm}(t)|^2\} = \mathcal{E}\{|\alpha\_{00}(t)|^2\},$$

however the temporal disjointness assumption of SDM only invents a drastically thinned out content in the individual higher-order spherical harmonics. To cover the available temporal information from all (2*n* + 1) spherical harmonic signals within each order *n* and for a similar formulation as for a single-direction component, we may re-formulate

$$\sum\_{m=-n}^{n} \mathcal{E}\{ \left| \alpha\_{nm}(t) \right|^2 \} = (2n+1) \mathcal{E}\{ \left| \alpha\_{00}(t) \right|^2 \}. \tag{A.55}$$

**Spherical convolution**. By the argumentation used above to prove rotation invariance, we can argue that isotropic filtering of spherical patterns is invariant under rotation, and must therefore depend only on the order *n*. Spherical convolution of is defined in [8] by the coefficients β*nm* of a function *b*(*θ*) convolved with the coefficients α*<sup>n</sup>* of a rotationally symmetric shape *a*(*θ*) = *a*(θz)

$$
\gamma\_{nm} = \alpha\_n \,\beta\_{nm}.\tag{A.56}
$$

**Spherical cap function**. A rotationally symmetric spherical cap function at <sup>±</sup><sup>α</sup> 2 centered around ϑ = 0, briefly *θ* z, can be written in terms of a unit-step. We find the shape coefficients *wn* for its spherical harmonic decomposition by

$$\mu \left( \cos \vartheta - \cos \frac{a}{2} \right) = \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} w\_n \, Y\_n^m(\theta) \, Y\_n^m(\theta\_\mathbf{z}) = \sum\_{n=0}^{\infty} w\_n \, P\_n(\cos \vartheta) \, \frac{2n+1}{4\pi}, \text{ (A.S7)}$$

where *Y <sup>m</sup> <sup>n</sup>* (*θ* <sup>z</sup>) = (2*n*+<sup>1</sup> <sup>4</sup><sup>π</sup> *P<sup>m</sup> <sup>n</sup>* (1) = (2*n*+<sup>1</sup> <sup>4</sup><sup>π</sup> δ*<sup>m</sup>* and *P*<sup>0</sup> *<sup>n</sup>* = *Pn* was used. The coefficients *wn* are obtained by integration over Legendre polynomials d cos ϑ % <sup>1</sup> <sup>−</sup><sup>1</sup> *Pn*(cos ϑ) and for the right hand-side using their orthogonality % <sup>1</sup> <sup>−</sup><sup>1</sup> *Pn*(ζ )*Pn*(ζ ) <sup>d</sup><sup>ζ</sup> <sup>=</sup> <sup>2</sup> <sup>2</sup>*n*+<sup>1</sup> <sup>δ</sup>*nn* , leaving *wn* <sup>2</sup> 2*n*+1 2*n*+1 <sup>4</sup><sup>π</sup> , so that

$$w\_n = 2\pi \int\_{\cos \frac{a}{2}}^1 P\_n(x) \,\mathrm{d}x.$$

The integral is solved by 2*nn*!*Pn* <sup>=</sup> <sup>d</sup>*<sup>n</sup> <sup>v</sup><sup>n</sup>* <sup>d</sup>*xn* with *v* = (*x* <sup>2</sup> − 1) after replacing the innermost derivative by <sup>d</sup> <sup>d</sup>*<sup>x</sup>* <sup>=</sup> <sup>d</sup>*<sup>v</sup>* d*x* d <sup>d</sup>*<sup>v</sup>* <sup>=</sup> <sup>2</sup>*<sup>x</sup>* <sup>d</sup> <sup>d</sup>*<sup>v</sup>* , and Leibniz' rule for repeated derivatives

$$\begin{split} \frac{\mathbf{d}^n \boldsymbol{\nu}^n}{\mathbf{d} \mathbf{x}^n} &= \frac{\mathbf{d}^{n-1} 2 \boldsymbol{\chi} \frac{\mathbf{d} \boldsymbol{\nu}^n}{\mathbf{d} \mathbf{v}}}{\mathbf{d} \boldsymbol{\chi}^{n-1}} = \frac{\mathbf{d}^{n-1} (2 \boldsymbol{\chi} \boldsymbol{n} \boldsymbol{\nu}^{n-1})}{\mathbf{d} \mathbf{x}^{n-1}} \\ &= (2n) \left[ \binom{0}{n-1} \frac{\mathbf{d}^0 \mathbf{x}}{\mathbf{d} \mathbf{x}^0} \frac{\mathbf{d}^{n-1} \boldsymbol{\nu}^{n-1}}{\mathbf{d} \mathbf{x}^{n-1}} + \binom{1}{n-1} \frac{\mathbf{d}^1 \mathbf{x}}{\mathbf{d} \mathbf{x}^1} \frac{\mathbf{d}^{n-2} \boldsymbol{\nu}^{n-1}}{\mathbf{d} \mathbf{x}^{n-2}} \right] \\ &= (2n) \left[ \mathbf{x} \frac{\mathbf{d}^{n-1} \boldsymbol{\nu}^{n-1}}{\mathbf{d} \mathbf{x}^{n-1}} + (n-1) \frac{\mathbf{d}^{n-2} \boldsymbol{\nu}^{n-1}}{\mathbf{d} \mathbf{x}^{n-2}} \right]. \end{split}$$

We may increase *n* by one, observe that the last expression has one fewer differential, thus is an integrated version, and obtain after re-inserting 2*nn*!*Pn* <sup>=</sup> <sup>d</sup>*<sup>n</sup> <sup>v</sup><sup>n</sup>* d*xn*

$$2^{n+1}(n+1)!\,P\_{n+1} = 2(n+1)\left[2^n n! \ge P\_n + n2^n n! \int P\_n \text{d}x - nC\right]$$

$$\int P\_n \text{d}x = \frac{P\_{n+1} - \ge P\_n}{n} + C \tag{A.58}$$

With the definite integration limits *<sup>x</sup>*<sup>0</sup> <sup>=</sup> cos <sup>α</sup> <sup>2</sup> and 1 and % <sup>1</sup> *<sup>x</sup>*<sup>0</sup> *Pn*d*x* only depends on lower boundary as *Pn*+<sup>1</sup>(1) − 1 · *Pn*(1) = 0,

$$\int\_{x\_0}^{1} P\_n(x) dx = -\frac{P\_{n+1}(\mathbf{x}\_0) - \mathbf{x}\_0 P\_n(\mathbf{x}\_0)}{n} \tag{A.59}$$

$$w\_n = -2\pi \frac{P\_{n+1}\left(\cos\frac{\underline{a}}{2}\right) - \cos\frac{\underline{a}}{2} P\_n\left(\cos\frac{\underline{a}}{2}\right)}{n}, \qquad \text{for } n > 0, \tag{A.60}$$

and *<sup>w</sup>*<sup>0</sup> <sup>=</sup> <sup>2</sup><sup>π</sup> % <sup>1</sup> cos <sup>α</sup> 2 <sup>d</sup>*<sup>x</sup>* <sup>=</sup> <sup>2</sup>π(<sup>1</sup> <sup>−</sup> cos <sup>α</sup> <sup>2</sup> ) for *n* = 0.

The recurrence (2*n* + 1)*x Pn*−(*n* + 1)*Pn*+<sup>1</sup> = *n Pn*−<sup>1</sup> yields alternatively for *n*>0

$$\omega\_n = 2\pi \frac{P\_{n+1}\left(\cos\frac{a}{2}\right) + P\_{n-1}\left(\cos\frac{a}{2}\right)}{2n+1} = 2\pi \frac{P\_{n-1}\left(\cos\frac{a}{2}\right) - \cos\frac{a}{2}P\_n\left(\cos\frac{a}{2}\right)}{n+1}.\tag{A.61}$$

#### **A.4 Encoding to SH and Decoding to SH**

*Mode-matching decoder*: L loudspeakers driven by the weights *gl* and given by their directions {**θ***l*} produce a pattern *f* (*θ*) linearly composed of Dirac deltas

$$f(\boldsymbol{\theta}) = \sum\_{l=1}^{L} \delta(\boldsymbol{\theta}^{\mathrm{T}} \boldsymbol{\theta}\_{l} - 1) \, \mathrm{g}\_{l} \tag{A.62}$$

$$= \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} Y\_{n}^{m}(\boldsymbol{\theta}) \sum\_{l=1}^{L} Y\_{n}^{m}(\boldsymbol{\theta}\_{l}) \, \mathrm{g}\_{l} = \mathbf{y}(\boldsymbol{\theta})^{\mathrm{T}} \mathbf{Y} \, \mathrm{g},$$

in an order-unlimited representation. The vector *y* = [*Y <sup>m</sup> <sup>n</sup>* (*θ*)]*nm* contains all the spherical harmonics 0 ≤ *n* ≤ ∞, −*n* ≤ *m* ≤ *n*, in a suitable order, e.g. Ambisonic Channel Number (ACN) *n*<sup>2</sup> + *m* + *n*, and the matrix *Y* = [ *y*(**θ***l*)]*<sup>l</sup>* contains the spherical harmonic coefficient vectors of every loudspeaker. Obviously, the spherical harmonic coefficients synthesized by the loudspeakers are *φ* = *Y g*, so that

$$f(\boldsymbol{\theta}) = \mathbf{y}(\boldsymbol{\theta})^{\mathrm{T}} \mathbf{y} \ = \mathbf{y}(\boldsymbol{\theta})^{\mathrm{T}} \boldsymbol{\phi}.$$

With L loudspeakers, at most (N + 1)<sup>2</sup> ≤ L spherical harmonics can be controlled. Therefore control typically restricts to the under-determined Nth-order subspace

$$
\boldsymbol{\phi\_N} = \mathbf{Y}\_N \mathbf{g},
$$

in which we can synthesize any coefficient vector *φ*N. To get a finite and welldetermined solution with the exceeding and arbitrary degrees of freedom in *g*, the least-squares solution for *g* is searched under the constraint

$$\begin{aligned} \min & \left\| \mathbf{g} \right\|^2\\ \text{subject to: } & \boldsymbol{\phi}\_{\text{N}} = \mathbf{Y}\_{\text{N}} \,\mathbf{g}, \end{aligned} \tag{A.63}$$

yielding the cost function with the Lagrange multipliers *λ*

$$J(\mathbf{g}, \lambda) = \mathbf{g}^{\mathsf{T}} \mathbf{g} + (\phi\_{\mathsf{N}} - \mathbf{Y}\_{\mathsf{N}} \mathbf{g})^{\mathsf{T}} \lambda... $$

For the optimum in *g*, its derivative to *g* is zero, and in *λ* the corresponding derivative:

$$\frac{\partial J}{\partial \mathbf{g}} = 2\mathbf{g}\_{\rm opt} - Y\_{\rm N}^{\rm T} \boldsymbol{\lambda} = 0,\qquad \qquad \frac{\partial J}{\partial \boldsymbol{\lambda}} = \boldsymbol{\phi}\_{\rm N} - Y\_{\rm N} \,\mathbf{g} = 0.$$

For *<sup>g</sup>* the equation yields *<sup>g</sup>*opt <sup>=</sup> <sup>1</sup> <sup>2</sup> *<sup>Y</sup>*<sup>T</sup> <sup>N</sup> *λ*, and for *λ* the original constraint *φ*<sup>N</sup> = *Y* <sup>N</sup> *g* that only allows to insert the optimal *<sup>g</sup>* yielding *<sup>φ</sup>*<sup>N</sup> <sup>=</sup> *<sup>Y</sup>* <sup>N</sup> ( <sup>1</sup> <sup>2</sup> *<sup>Y</sup>*<sup>T</sup> <sup>N</sup> *λ*opt). Inversion of ( 1 <sup>2</sup>*<sup>Y</sup>* <sup>N</sup>*Y*<sup>T</sup> <sup>N</sup>)−<sup>1</sup> from the left yields the multipliers 2(*Y* <sup>N</sup>*Y*<sup>T</sup> <sup>N</sup>)−<sup>1</sup>*φ*<sup>N</sup> = *λ*opt, so that

$$\mathbf{g} = \mathbf{Y}\_{\mathrm{N}}^{\mathrm{T}} (\mathbf{Y}\_{\mathrm{N}} \mathbf{Y}\_{\mathrm{N}}^{\mathrm{T}})^{-1} \boldsymbol{\phi}\_{\mathrm{N}}.\tag{A.64}$$

The solution is right-inverse to *<sup>Y</sup>* N, i.e. *<sup>Y</sup>* <sup>N</sup> [*Y*<sup>T</sup> <sup>N</sup> (*Y* <sup>N</sup>*Y*<sup>T</sup> <sup>N</sup>)−<sup>1</sup>] = *I*.

*Best-fit encoder by MMSE*: When given M samples of a spherical function *g*(*θ*) at the locations {**θ***l*}, we can minimize the means-square error (MMSE)

$$\min \sum\_{l=1}^{L} \left[ g(\boldsymbol{\theta}\_{l}) - \sum\_{n=0}^{N} \sum\_{m=-n}^{n} Y\_{n}^{m}(\boldsymbol{\theta}\_{l}) \, \boldsymbol{\chi}\_{nm} \right]^{2}$$

to find suitable spherical harmonic coefficients γ*nm*. Using the matrix notation from above, this is

$$\min \left\| \boldsymbol{e} \right\|^2 = \min \left\| \mathbf{g} - \mathbf{Y}\_{\mathcal{N}}^{\mathrm{T}} \mathbf{y}\_{\mathcal{N}} \right\|^2 \tag{A.65}$$

and we find by zeroing the derivative

$$\begin{split} \frac{\partial}{\partial \boldsymbol{\mathcal{V}}\_{\rm N}} \mathbf{e}^{\rm T} \mathbf{e} &= 2 \left( \frac{\partial \mathbf{e}}{\partial \boldsymbol{\mathcal{V}}\_{\rm N}} \right)^{\rm T} \mathbf{e} = 2 \mathbf{Y}\_{\rm N} \mathbf{e} = 2 \mathbf{Y}\_{\rm N} \mathbf{Y}\_{\rm N}^{\rm T} \mathbf{y}\_{\rm N} - 2 \mathbf{Y}\_{\rm N} \mathbf{g} = \mathbf{0}, \\ \Longrightarrow \quad \mathbf{y}\_{\rm N} &= (\mathbf{Y}\_{\rm N}^{\rm T} \mathbf{Y}\_{\rm N})^{-1} \mathbf{Y}\_{\rm N} \ \mathbf{g}. \end{split} \tag{A.66}$$

The resulting matrix is left-inverse to the thin matrix *Y*<sup>T</sup> N, and can be written in terms of the more general pseudo inverse (*Y*<sup>T</sup> <sup>N</sup>)†.

#### **A.5 Covariance Constraint for Binaural Ambisonic Decoding**

The interaural covariance matrix is related to the expectation value of the auto/crosscovariances of the left and right HRTFs:

$$\mathcal{R}^\* = \int\_{\mathbb{S}^2} \left[ \frac{h\_{\text{left}}(\theta, \omega)}{h\_{\text{right}}(\theta, \omega)} \right] \left[ h\_{\text{left}}(\theta, \omega)^\* \ h\_{\text{right}}(\theta, \omega)^\* \right] \, \text{d}\theta = \int\_{\mathbb{S}^2} h(\theta, \omega) h(\theta, \omega)^{\text{H}} \, \text{d}\theta. \tag{A.67}$$

When specified in terms of spherical harmonic coefficients *h* = *y*<sup>T</sup>*h*SH, the integral % <sup>S</sup><sup>2</sup> *h*<sup>∗</sup> <sup>1</sup>*h*2d*θ* of any of *R*'s entries vanishes by the orthogonality of the spherical harmonics *h*<sup>H</sup> SH1 % <sup>S</sup><sup>2</sup> *y y*<sup>T</sup> <sup>d</sup>*<sup>θ</sup> <sup>h</sup>*SH2 <sup>=</sup> *<sup>h</sup>*<sup>H</sup> SH1*h*SH2, and we obviously only need the inner product between the spherical-harmonic coefficients of the HRTFs.

A very-high-order spherical harmonics HRTF dataset *H*<sup>H</sup> SH of dimensions 2 × (M + 1)<sup>2</sup> with the order (M N) yields a covariance matrix at every frequency

$$\mathcal{R} = H\_{\rm SH}^{\rm H} H\_{\rm SH} = X^{\rm H} X$$

that can be factored into a quadratic form of a 2 × 2 matrix *X* by Cholesky factorization, which reduces the degrees of freedom involved to the minimum required size. The Nth-order Ambisonically reproduced, high-frequency modified HRTF dataset *H***˜** SH of dimensions 2 × (M + 1)<sup>2</sup> also has a 2 × 2 covariance matrix *R***ˆ** that will differ from *R*, and which we also decompose in Cholesky factors *X***ˆ** ,

$$
\hat{\boldsymbol{R}} = \hat{\boldsymbol{H}}\_{\text{SH}}^{\text{H}} \hat{\boldsymbol{H}}\_{\text{SH}} = \hat{\boldsymbol{X}}^{\text{H}} \hat{\boldsymbol{X}}.\tag{A.68}
$$

To equalize *R* = *R***ˆ**, the reproduced HRTF set is corrected by a 2 × 2 filter matrix *M*,

$$
\bar{H}\_{\rm SH,corr} = \bar{H}\_{\rm SH} \mathcal{M}.\tag{A.69}
$$

This is done properly as soon as

$$X^{\mathcal{H}}X = M^{\mathcal{H}}\hat{X}^{\mathcal{H}}\hat{X}M = M^{\mathcal{H}}\hat{X}^{\mathcal{H}}\overbrace{\mathcal{Q}^{\mathcal{H}}\mathcal{Q}}^{I}\hat{X}M,\tag{A.70}$$

and the orthogonal matrix *Q* is used to compensate for degrees of freedom that the Cholesky factors *X* and *X***ˆ** have in sign, phase, mixing, with regard to each other. We recognize the root and hereby the preliminary solution for *M*

$$\mathbf{X} = \mathbf{Q}\hat{\mathbf{X}}\mathbf{M}, \qquad \Rightarrow \mathbf{M} = \hat{\mathbf{X}}^{-1}\mathbf{Q}^{\text{H}}\mathbf{X}. \tag{\text{A.71}}$$

This leaves *<sup>H</sup>***<sup>ˆ</sup>** SH,corr <sup>=</sup> *<sup>H</sup>***<sup>ˆ</sup>** SH *<sup>X</sup>***<sup>ˆ</sup>** <sup>−</sup><sup>1</sup> *<sup>Q</sup>*<sup>H</sup> *<sup>X</sup>* depending on an unspecific orthogonal 2 <sup>×</sup> 2 matrix *Q*. To obtain a corrected-covariance HRTFs *H***ˆ** corr.SH of highest-possible phase-alignment and correlation to its uncorrected counterpart *H***ˆ** SH, we maximize the trace, i.e. the sum of diagonal elements

$$\max \mathfrak{Re} \text{Tr} \{ \hat{\boldsymbol{H}}\_{\text{SH}}^{\text{H}} \hat{\boldsymbol{H}}\_{\text{corr,SH}} \} = \max \mathfrak{Re} \text{Tr} \{ \hat{\boldsymbol{H}}\_{\text{SH}}^{\text{H}} \hat{\boldsymbol{H}}\_{\text{SH}} \hat{\boldsymbol{X}}^{-1} \boldsymbol{\mathcal{Q}}^{\text{H}} \boldsymbol{X} \} = \max \mathfrak{Re} \text{Tr} \{ \hat{\boldsymbol{X}}^{\text{H}} \boldsymbol{\mathcal{Q}}^{\text{H}} \boldsymbol{X} \} = \max \mathfrak{Re} \text{Tr} \{ \hat{\boldsymbol{\mathcal{Q}}}^{\text{H}} \hat{\boldsymbol{\mathcal{Q}}}^{\text{H}} \boldsymbol{X} \}.$$

For the last expression, the property Tr{*AB*} = Tr{*B A*} was used. An orthogonal matrix *<sup>Q</sup>*<sup>H</sup> <sup>=</sup> *V U*<sup>H</sup> composed of two orthogonal matrices *<sup>U</sup>* and *<sup>V</sup>* would yield Tr{*X***<sup>ˆ</sup>** <sup>H</sup> *XVU*<sup>H</sup>} = Tr{*U*<sup>H</sup> *<sup>X</sup>***<sup>ˆ</sup>** <sup>H</sup> *XV*}, and it would maximize the trace if *<sup>U</sup>* and *<sup>V</sup>* <sup>H</sup> diagonalized *<sup>X</sup>***<sup>ˆ</sup>** <sup>H</sup> *X*. This is accomplished by singular-value decomposition (SVD) *X***ˆ** H *<sup>X</sup>* <sup>=</sup> *U SV* H, when singular values *<sup>S</sup>* <sup>=</sup> diag{[*s*1, *<sup>s</sup>*2]} are real and positive, as in most SVD implementations. Using *U* and *V*, the desired solution is:

$$
\hat{H}\_{\text{corr,SH}} = \hat{H}\_{\text{SH}} \hat{X}^{-1} \mathbf{V} \mathbf{U}^{\text{H}} \mathbf{X}.\tag{A.72}
$$

If the SVD delivers negative or complex-valued singular values, the complex/negative factor just need to be pulled out and factored into either the corresponding left or right singular vector.

#### **A.6 Physics of the Helmholtz Equation**

#### *A.6.1 Adiabatic Compression*

We search for a physical compression equation relating pressure *p* and volume *V*.

*Ideal gas*. The gas pressure *p* inside the volume *V* obeys the ideal gas law [9]

$$p\,V = n\,R\,T,\tag{A.73}$$

with *n* measuring the amount of substance in moles, *R* is the gas constant, and *T* is the temperature. This would yield a valid compression equation if the medium of sound propagation was isothermal. However, this is not the case *T* = const, and local temperature fluctuations happen too fast to be equalized by thermal dissipation. Isothermal compression would be too soft, the resulting speed of sound off by −15%. A compression law involving fluctuations of all three quantities (*p*, *V*, *T* ) needs an additional equation.

*First law of thermodynamics*. In thermodynamics [10–12], the enthalpy *H* describes the energy required to heat up a freely expanding gas under constant pressure *p*. The enthalpy goes to the internal energy *U* required to heat up the gas in a constant volume, which is easier, plus the ideal-gas volume work *p V* taken by the gas to expand under the constant external pressure

$$H = U + p\,V,\tag{A.74}$$

$$\text{Specifically } n\,c\_{\mathbb{P}}\,T = n\,c\_{\mathbb{V}}\,T + n\,R\,T,\qquad \Rightarrow R = c\_{\mathbb{P}} - c\_{\mathbb{V}}.$$

The quantities *c*<sup>p</sup> and *c*<sup>V</sup> are the specific heat capacities for heating up a gas that is expanding (*p* = const.) or confined in a fixed volume (*V* = const.) to a temperature *T* , which can be accurately measured or modeled. Obviously, the gas constant *R* is the difference between the two. To make sound propagation isenthalpic, the energy must fluctuate between internal energy *U* and volume work *pV*.

*Adiabatic process*. The above steady-state equations are not useful yet to describe short-term fluctuations of *p*, *V*, and *T* in time and space. A differential formulation related to the change in enthalpy, internal energy, and volume work d*H* = d*U* + *p* d*V* is more useful. Moreover, we regard packages of a constant amount of substance whose internal heat up is just due to compression and not due to external enthalpy sources, it is therefore isenthalpic d*H* = 0, see [10, Sect. 3.12.2], [13]

$$0 = n \, c\_{\nabla} \, \mathrm{d}T + \frac{n \, R \, T}{V} \, \mathrm{d}V.$$

We may divide by *n c*<sup>V</sup> *<sup>T</sup>* , replace *<sup>R</sup>* <sup>=</sup> *<sup>c</sup>*<sup>p</sup> <sup>−</sup> *<sup>c</sup>*V, and obtain <sup>d</sup>*<sup>T</sup> <sup>T</sup>* <sup>+</sup> ( *<sup>c</sup>*<sup>p</sup> *<sup>c</sup>*<sup>V</sup> <sup>−</sup> <sup>1</sup>) <sup>d</sup>*<sup>V</sup> <sup>V</sup>* = 0, whose integration yields ln *<sup>T</sup>*+( *<sup>c</sup>*<sup>p</sup> *<sup>c</sup>*<sup>V</sup> − 1)ln *V*= ln *T V <sup>c</sup>*<sup>p</sup> *<sup>c</sup>*<sup>V</sup> <sup>−</sup><sup>1</sup> <sup>=</sup> 0, hence *T V <sup>c</sup>*<sup>p</sup> *<sup>c</sup>*<sup>V</sup> <sup>−</sup><sup>1</sup> <sup>=</sup> 1, and with the ideal gas equation inserted as *<sup>T</sup>* <sup>=</sup> *p V n R* , the *adiabatic process law* becomes

$$p\,V^{\frac{\Omega}{\sqrt{\gamma}}} = n\,R = \text{const},\tag{A.75}$$

for which the adiabatic exponent is frequently expressed as <sup>γ</sup> <sup>=</sup> *<sup>c</sup>*<sup>p</sup> *<sup>c</sup>*<sup>V</sup> . For air, the exponent is γ = 1.4, and we may express a state change as(*p*0, *V*0) → (*p*<sup>0</sup> + *p*, *V*<sup>0</sup> + *V*). The equation *p*<sup>0</sup> *V*<sup>γ</sup> <sup>0</sup> <sup>=</sup> (*p*<sup>0</sup> <sup>+</sup> *<sup>p</sup>*)(*V*<sup>0</sup> <sup>+</sup> *<sup>V</sup>*)<sup>γ</sup> yields after division by *<sup>p</sup>*<sup>0</sup> *<sup>V</sup>*<sup>γ</sup> <sup>0</sup> and by (<sup>1</sup> <sup>+</sup> *<sup>V</sup> V*0 )<sup>γ</sup> :

$$1 + \frac{p}{p\_0} = \left(1 + \frac{V}{V\_0}\right)^{-\gamma} \approx 1 - \gamma \frac{V}{V\_0}, \qquad \text{hence } p = -\gamma \text{ } p\_0 \frac{V}{V\_0}.$$

Assuming the Cartesian coordinates *x*, *y*, *z* measured in the resting gas to define its volume *V*<sup>0</sup> = *xyz*, as well as its deflected coordinates ξ(*x*), η(*y*), ζ (*z*) after a volume change to *V*, we can approximate the volume change well-enough by the three independent volume changes ξ*yz*, *x*η*z*, and *xy*ζ , resulting from the superimposed individual elongation into the three coordinates' directions,

$$\lim\_{V\_0 \to 0} \frac{\Delta V}{V\_0} = \lim\_{V\_0 \to 0} \frac{\Delta \xi \Delta \mathbf{y} \Delta z + \Delta x \Delta \eta \Delta z + \Delta x \Delta \mathbf{y} \Delta \xi}{\Delta x \Delta \mathbf{y} \Delta z} = \frac{\partial \xi}{\partial x} + \frac{\partial \eta}{\partial y} + \frac{\partial \xi}{\partial z} = \nabla^\mathsf{T} \xi.$$

Replacing the bulk modulus *K* = γ *p*<sup>0</sup> = ρ *c*<sup>2</sup> of air by more common constants,1 where *<sup>c</sup>* <sup>=</sup> <sup>√</sup>*K*/ρ and applying the derivative in time <sup>∂</sup> <sup>∂</sup>*<sup>t</sup>* , we get the equation of compression in its typical form using the velocities <sup>∂</sup>*<sup>ξ</sup>* <sup>∂</sup>*<sup>t</sup>* = *v*,

$$\frac{\partial p}{\partial t} = -\rho \, c^2 \, \nabla^\mathbf{\mathbf{v}} \mathbf{v}. \tag{A.76}$$

#### *A.6.2 Potential and Kinetic Sound Energies, Intensity, Diffuseness*

The *potential* energy density or *volume work* stored in the elastic medium that gets compressed by a deformation d*V* increases with d*w*<sup>p</sup> = *p* d*V*, while deformation also increases the pressure by d *p* = *K* d*V*. We may substitute for d*V* = *K* <sup>−</sup>1d *p*

<sup>1</sup>Typical constants are <sup>γ</sup> <sup>=</sup> <sup>1</sup>.4, *<sup>p</sup>*<sup>0</sup> <sup>=</sup> <sup>10</sup><sup>5</sup> Pa, <sup>ρ</sup> <sup>=</sup> <sup>1</sup>.2 kg/m3, *<sup>c</sup>* <sup>=</sup> 343m/s.

yielding d*w*<sup>p</sup> = *K* <sup>−</sup><sup>1</sup> *p* d *p*. The volume work stored by a pressure increase from 0 to *p* is

$$\rho w\_{\rm p} = \int\_0^p \frac{p \, \mathrm{d}p}{K} = \frac{p^2}{2K} = \frac{p^2}{2\rho c^2}. \tag{A.77}$$

The *kinetic* energy density stored in the motion of the medium along any axis, e.g. *x*, increases by acceleration against its mass, d*w*vx <sup>=</sup> <sup>ρ</sup> <sup>d</sup>*v*<sup>x</sup> <sup>d</sup>*<sup>t</sup>* <sup>d</sup>*x*. The velocity is *<sup>v</sup>*<sup>x</sup> <sup>=</sup> <sup>d</sup>*<sup>x</sup>* d*t* so that we substitute for d*x* = *v*xd*t* to get d*w*vx = ρ *v*<sup>x</sup> d*v*x. The total kinetic energy density stored in velocities increasing from 0 to *v*x, *v*y, *v*<sup>z</sup> is

$$\rho\_{\mathbf{v}} = \rho \left[ \int\_{0}^{\mathbf{v}\_{\mathbf{x}}} \mathbf{v}\_{\mathbf{x}} \, \mathrm{d}\mathbf{v}\_{\mathbf{x}} + \int\_{0}^{\mathbf{v}\_{\mathbf{y}}} \mathbf{v}\_{\mathbf{y}} \, \mathrm{d}\mathbf{v}\_{\mathbf{y}} + \int\_{0}^{\mathbf{v}\_{\mathbf{z}}} \mathbf{v}\_{\mathbf{z}} \, \mathrm{d}\mathbf{v}\_{\mathbf{z}} \right] = \rho \frac{\mathbf{v}\_{\mathbf{x}}^{2} + \mathbf{v}\_{\mathbf{y}}^{2} + \mathbf{v}\_{\mathbf{z}}^{2}}{2} = \frac{\rho \|\mathbf{v}\|^{2}}{2}. \tag{A.78}$$

*Total energy density and intensity*. The total energy density therefore becomes

$$
\omega = \mathbf{w}\_{\mathbf{p}} + \mathbf{w}\_{\mathbf{v}} = \frac{p^2}{2\rho c^2} + \frac{\rho \|\mathbf{v}\|^2}{2},
\tag{A.79}
$$

and derived with regard to time, it becomes

$$\frac{\partial \boldsymbol{\nu}}{\partial t} = \frac{p}{\rho c^2} \frac{\partial p}{\partial t} + \rho \boldsymbol{\nu}^\mathsf{T} \frac{\partial \boldsymbol{\nu}}{\partial t} = p \boldsymbol{\nabla}^\mathsf{T} \boldsymbol{\nu} + \boldsymbol{\nu}^\mathsf{T} \boldsymbol{\nabla} p = \boldsymbol{\nabla}^\mathsf{T} (p\boldsymbol{\nu}) = \boldsymbol{\nabla}^\mathsf{T} \boldsymbol{I},\qquad(\text{A.80})$$

and defines the (time-domain) intensity vector *I* = *pv* that describes the energy flow in space. Hereby, <sup>∂</sup>*<sup>w</sup>* <sup>∂</sup>*<sup>t</sup>* <sup>=</sup> **<sup>∇</sup>**<sup>T</sup> *<sup>I</sup>* expresses that only a non-zero *divergence* of the intensity causes energy increase (source) or loss (absorption) in the lossless medium.

*Direction of arrival and diffuseness*: The intensity vector carries a meaning in its own right: it displays into which direction the energy flows (direction of emission). In the frequency domain, it becomes *I* = Re{*p*∗*v*}, and for a plane-wave sound field *<sup>p</sup>* <sup>=</sup> *<sup>e</sup>*<sup>i</sup>*<sup>k</sup> <sup>θ</sup>*<sup>T</sup> <sup>s</sup> *<sup>r</sup>*, where *<sup>v</sup>* = − **<sup>∇</sup>***<sup>p</sup>* <sup>i</sup>*k*ρ*<sup>c</sup>* = −*<sup>θ</sup>*<sup>s</sup> *<sup>p</sup>* <sup>ρ</sup>*<sup>c</sup>* , it indicates the direction of arrival (DOA)

$$r\_{\rm DAA} = -\frac{\rho c \, I}{|p|^2} = -\frac{\rho c \, \Re\{p^\* \nu\}}{|p|^2} = \frac{\rho c \, |p|^2 \theta\_s}{\rho c \, |p|^2} = \theta\_s. \tag{A.81}$$

An ideal, uniformly enveloping diffuse field is composed of uncorrelated plane waves *<sup>E</sup>*{ *<sup>a</sup>*(*<sup>θ</sup>* <sup>1</sup>)∗*a*(*<sup>θ</sup>* <sup>2</sup>) <sup>4</sup><sup>π</sup> } = *<sup>a</sup>*<sup>2</sup> <sup>4</sup><sup>π</sup> δ(<sup>1</sup> <sup>−</sup> *<sup>θ</sup>*<sup>T</sup> <sup>1</sup> *<sup>θ</sup>* <sup>2</sup>)resulting in the sound pressure *<sup>p</sup>* <sup>=</sup> % *<sup>a</sup>*(*θ*s) <sup>√</sup>4<sup>π</sup> *<sup>e</sup>*<sup>i</sup>*kθ*<sup>T</sup> <sup>s</sup> *<sup>r</sup>*d*θ*s. While the expected sound pressure is non-zero as before *E*{|*p*| <sup>2</sup>} = *<sup>a</sup>*<sup>2</sup> <sup>4</sup><sup>π</sup> = |*p*| 2, the expected intensity of the uniformly surrounding waves vanishes <sup>−</sup>ρ*c E*{*I*} = *<sup>a</sup>*<sup>2</sup> 4π % <sup>S</sup><sup>2</sup> *θ*<sup>s</sup> d*θ*<sup>s</sup> = **0**.

Assuming stochastic interference of all sources, the intensity-based DOA estimator *<sup>r</sup>*DOA = −<sup>ρ</sup>*<sup>c</sup>* Re{*p*∗*v*} |*p*| <sup>2</sup> is therefore the physical equivalent to the *r*<sup>E</sup> vector measure. A typical diffuseness measure 0 ≤ ψ ≤ 1 relies on its length between 0 and 1

$$\psi = 1 - \|\mathbf{r}\_{\text{DA}}\|^2. \tag{A.82}$$

The signals *W* = *p* and [*X*, *Y*, *Z*] <sup>T</sup> = <sup>√</sup>2**∇***<sup>p</sup>* <sup>i</sup>*<sup>k</sup>* = −ρ*c* <sup>√</sup>2*<sup>v</sup>* of a first-order Ambisonic microphone allow to describe a time-domain estimator *r*DOA

$$\sigma\_{\rm DOA} = -\frac{\rho c \, E\{I\}}{E\{p^2\}} = -\frac{\rho c \, E\{p\,\nu\}}{E\{p^2\}} = \frac{E\{W\,[X,Y,Z]^\text{\textdegree}\}}{\sqrt{2}\, E\{W^2\}}.\tag{A.83}$$

#### *A.6.3 Green's Function in 3 Cartesian Dimensions*

We may compose the Green's function, the solution to the inhomogeneous wave equation

$$\left(\triangle - \frac{1}{c^2} \frac{\partial^2}{\partial t^2}\right) G = -\delta(t)\delta(r),$$

from products of complex exponentials with regard to time and the Cartesian directions

$$e^{i\boldsymbol{\alpha}\cdot\mathbf{t}+i\mathbf{k}\_{\mathbf{x}}\cdot\mathbf{x}+i\mathbf{k}\_{\mathbf{y}}\cdot\mathbf{y}+i\mathbf{k}\_{\mathbf{z}}\cdot\mathbf{z}} = e^{i\mathbf{k}^{\mathrm{T}}\mathbf{r}}\,e^{i\boldsymbol{\alpha}\cdot\mathbf{t}},\tag{\text{A.84}}$$

where the position *x*, *y*, *z* was gathered in a position vector *r*, and the *wave numbers k*x, *k*y, *k*<sup>z</sup> of the individual coordinates were gathered in a wave-number vector *k*. From this solution, we compose the Green's function by superimposing all spatial and temporal complex exponentials in *k* and ω, weighted by an unknown coefficient γ :

$$G = \iint \limits \gamma \, e^{\mathbf{i}\mathbf{k}^{\mathrm{T}} \mathbf{r}} \, e^{\mathbf{i}\boldsymbol{\omega} \cdot \mathbf{t}} \, \mathrm{d}\boldsymbol{\omega} \, \mathrm{d}\mathbf{k}.\tag{A.85}$$

Because of *<sup>e</sup>*<sup>i</sup> *<sup>k</sup>*<sup>T</sup> *<sup>r</sup>* <sup>=</sup> (−*k*<sup>2</sup> *<sup>x</sup>* − *k*<sup>2</sup> *<sup>y</sup>* − *k*<sup>2</sup> *<sup>z</sup>* ) *<sup>e</sup>*<sup>i</sup> *<sup>k</sup>*<sup>T</sup> *<sup>r</sup>* = −*k*<sup>2</sup>*e*<sup>i</sup> *<sup>k</sup>*<sup>T</sup> *<sup>r</sup>* and <sup>∂</sup><sup>2</sup> <sup>∂</sup>*<sup>t</sup>* <sup>2</sup> *e*<sup>i</sup>ω*<sup>t</sup>* = −ω<sup>2</sup>*e*<sup>i</sup>ω*<sup>t</sup>* , insertion into the inhomogeneous wave equation yields

$$-\iint \chi\left[k^2 - \frac{\alpha^2}{c^2}\right] e^{\mathbf{i}\cdot\mathbf{k}^T \mathbf{r}} e^{\mathbf{i}\cdot\mathbf{a}\cdot\mathbf{r}} \,\mathrm{d}\boldsymbol{\omega}\,\mathrm{d}\mathbf{k} = -\delta(t)\,\delta(\mathbf{r}).$$

Multiple transformations *e*−<sup>i</sup> *<sup>k</sup>*<sup>ˆ</sup> T *<sup>r</sup>e*−iωˆ*<sup>t</sup>* d*r* d*t* %% remove the integrals by orthogonality

$$-\iint \mathbf{y} \left(\mathbf{k}^2 - \frac{\mathbf{o}^2}{\varepsilon^2}\right) \underbrace{\{\int \mathbf{e}^{i(\mathbf{k}-\hat{\mathbf{k}})^T \mathbf{r}} \mathbf{d} \mathbf{r}\} \underbrace{\{\int \mathbf{e}^{i(\alpha-\hat{\boldsymbol{\omega}})t} \mathbf{d} \mathbf{r}\}}\_{(2\pi)^3 \delta(\mathbf{k}-\hat{\mathbf{k}})} \mathbf{d} \boldsymbol{\omega} \, \mathbf{d} \mathbf{k} = -\underbrace{\int \delta(t) \, e^{-i\hat{\mathbf{k}}t} \mathbf{d} \mathbf{t}}\_{\mathbf{l}} \underbrace{\delta(\mathbf{r}) \, e^{-i\hat{\mathbf{k}}^T \mathbf{r}} \mathbf{d} \mathbf{r}}\_{\mathbf{l}}.$$

and the unknown coefficient remains <sup>γ</sup> <sup>=</sup> <sup>1</sup> (2π )3+<sup>1</sup> 1 *<sup>k</sup>*2<sup>−</sup> <sup>ω</sup><sup>2</sup> *c*2 . Letting *G* in the frequency domain, one <sup>1</sup> <sup>2</sup><sup>π</sup> in γ and the integral *e*iω*<sup>t</sup>* dω % are omitted. We transform γ back via *k*

$$G = \frac{1}{(2\pi)^3} \iint \frac{e^{i\mathbf{k}^\top r}}{k^2 - \frac{\alpha^2}{c^2}} \,\mathrm{d}k = \frac{1}{(2\pi)^3} \iint \frac{e^{ikr\zeta}}{k^2 - \frac{\alpha^2}{c^2}} \,\mathrm{d}k.$$

By re-expressing *<sup>k</sup>*<sup>T</sup> *<sup>r</sup>* <sup>=</sup> *kr <sup>θ</sup>*<sup>T</sup> <sup>k</sup> *θ*<sup>r</sup> = *kr* cos ϑ = *kr*ζ we simplify the integral. *Now we already see formally that Green's function can only depend on the distance r between source and receiver location G* = *G*(ω,*r*).

The book [14, S.110-112] shows a notably compact derivation, which we will use below.

*Derivation for by transforming back from the Fourier domain*: For three dimensions, the transformation back from the Fourier domain is relatively easy to accomplish. Before going into details, we recognize that the substitution of *k*<sup>T</sup> *r* by *kr* cos ϑ contains the radius of the wave vector *k* = *k* and the cosine of the angle between *r* and *k*. In *k* space, we can always define a correspondingly oriented coordinate system for any *r* as to simplify the integral %%% <sup>∞</sup> −∞ <sup>d</sup>*<sup>k</sup>* <sup>=</sup> % <sup>∞</sup> 0 % <sup>2</sup><sup>π</sup> 0 % π <sup>0</sup> *k*<sup>2</sup> d*k* dϕ d cos ϑ = % <sup>∞</sup> 0 % <sup>2</sup><sup>π</sup> 0 % 1 <sup>−</sup><sup>1</sup> *<sup>k</sup>*<sup>2</sup> <sup>d</sup>*<sup>k</sup>* <sup>d</sup><sup>ϕ</sup> <sup>d</sup><sup>ζ</sup> . After re-arranging the integrals, we get

*G* = % <sup>2</sup><sup>π</sup> <sup>0</sup> dϕ (2π )<sup>3</sup> \$ ∞ 0 % 1 <sup>−</sup><sup>1</sup> *<sup>e</sup>*i*kr*<sup>ζ</sup> <sup>d</sup><sup>ζ</sup> *<sup>k</sup>*<sup>2</sup> <sup>−</sup> <sup>ω</sup><sup>2</sup> *c*2 *<sup>k</sup>*<sup>2</sup> <sup>d</sup>*<sup>k</sup>* <sup>=</sup> <sup>1</sup> (2π )<sup>2</sup> \$ ∞ 0 1 i*kr <sup>e</sup>*i*kr* <sup>−</sup> *<sup>e</sup>*−i*kr <sup>k</sup>*<sup>2</sup> <sup>−</sup> <sup>ω</sup><sup>2</sup> *c*2 *k*<sup>2</sup> d*k* <sup>=</sup> <sup>1</sup> (2π )<sup>2</sup> 1 i*r* \$ ∞ 0 *<sup>e</sup>*i*kr* <sup>−</sup> *<sup>e</sup>*−i*kr <sup>k</sup>*<sup>2</sup> <sup>−</sup> <sup>ω</sup><sup>2</sup> *c*2 *<sup>k</sup>* <sup>d</sup>*<sup>k</sup>* <sup>=</sup> <sup>1</sup> (2π )<sup>2</sup> 1 i*r* \$ <sup>∞</sup> 0 *e*i*kr k <sup>k</sup>*<sup>2</sup> <sup>−</sup> <sup>ω</sup><sup>2</sup> *c*2 d*k* − \$ ∞ 0 *e*−i*kr k <sup>k</sup>*<sup>2</sup> <sup>−</sup> <sup>ω</sup><sup>2</sup> *c*2 d*k* <sup>=</sup> <sup>1</sup> (2π )<sup>2</sup> 1 i*r* \$ <sup>∞</sup> 0 *e*i*kr k <sup>k</sup>*<sup>2</sup> <sup>−</sup> <sup>ω</sup><sup>2</sup> *c*2 d*k* − \$ −∞ 0 *<sup>e</sup>*−i(−*k*)*<sup>r</sup>* (−*k*) (−*k*)<sup>2</sup> <sup>−</sup> <sup>ω</sup><sup>2</sup> *c*2 d(−*k*) <sup>=</sup> <sup>1</sup> (2π )<sup>2</sup> 1 i*r* \$ <sup>∞</sup> 0 *e*i*kr k <sup>k</sup>*<sup>2</sup> <sup>−</sup> <sup>ω</sup><sup>2</sup> *c*2 d*k* + \$ <sup>0</sup> −∞ *e*i*kr k <sup>k</sup>*<sup>2</sup> <sup>−</sup> <sup>ω</sup><sup>2</sup> *c*2 d*k* <sup>=</sup> <sup>1</sup> (2π )<sup>2</sup> 1 i*r* \$ ∞ −∞ *e*i*kr <sup>k</sup>*<sup>2</sup> <sup>−</sup> <sup>ω</sup><sup>2</sup> *c*2 *k* d*k*. (A.86)

The denominator is expanded in partial fractions <sup>1</sup> *<sup>k</sup>*2<sup>−</sup> <sup>ω</sup><sup>2</sup> *c*2 <sup>=</sup> <sup>1</sup> 2*k* 1 *<sup>k</sup>*<sup>−</sup> <sup>ω</sup> *c* <sup>+</sup> <sup>1</sup> 2*k* 1 *<sup>k</sup>*<sup>+</sup> <sup>ω</sup> *c* , yielding

$$G = \frac{1}{2(2\pi)^2} \frac{1}{\mathrm{i}r} \left[ \int\_{-\infty}^{\infty} \frac{e^{\mathrm{i}kr}}{k - \frac{\omega}{c}} \, \mathrm{d}k + \int\_{-\infty}^{\infty} \frac{e^{\mathrm{i}kr}}{k + \frac{\omega}{c}} \, \mathrm{d}k \right]. \tag{A.87}$$

To obtain causal temporal solutions, there needs to be a specific solution of the improper and singular integrals.

$$h(t) = \int\_{-\infty}^{\infty} \frac{e^{i\omega t}}{\omega - a} \,\mathrm{d}\omega\tag{A.88}$$

are obtained by replacing the improper integral % <sup>∞</sup> −∞ by a closed integration contour, and by introducing vanishing regularization. Jordan's lemma states that improper integration % <sup>∞</sup> −∞ is equivalent to a closed integration path *<sup>C</sup>*<sup>+</sup> of positive orientation involving the additional semi-circle on the upper half of the complex number plane % <sup>∞</sup> −∞ <sup>=</sup> - *<sup>C</sup>*<sup>+</sup> <sup>=</sup> lim*<sup>R</sup>*→∞% *<sup>R</sup>* <sup>−</sup>*<sup>R</sup>* <sup>d</sup><sup>ω</sup> <sup>+</sup> % <sup>π</sup> <sup>0</sup> *R* dϕ if the integrand of the semi-circle vanishes, i.e. lim*<sup>R</sup>*→∞ *<sup>e</sup>*(i cos <sup>ϕ</sup>−sin ϕ) *R t R e*iϕ−*<sup>a</sup>* <sup>=</sup> 0. This is the case for positive times *<sup>t</sup>* <sup>&</sup>gt; 0. For negative times, the integral can be closed using the lower part of the complex number plane, % <sup>∞</sup> −∞ <sup>=</sup> - *<sup>C</sup>*<sup>−</sup> <sup>=</sup> lim*<sup>R</sup>*→∞% *<sup>R</sup>* <sup>−</sup>*<sup>R</sup>* <sup>d</sup><sup>ω</sup> <sup>+</sup> % <sup>−</sup><sup>π</sup> <sup>0</sup> *R* dϕ , if the semi-circular integral vanishes, which is true for negative times *t* < 0 in our case, see Fig.A.1. We get

$$h(t) = \begin{cases} \oint\_{C\_+} \frac{e^{j\omega t}}{\omega - a} d\omega, & \text{if } t > 0, \\ \oint\_{C\_-} \frac{e^{j\omega t}}{\omega - a} d\omega, & \text{if } t < 0. \end{cases} \tag{A.89}$$

According to Cauchy's integral formula for analytic regular functions *f* (*z*) over a single pole <sup>1</sup> *<sup>z</sup>*−*<sup>a</sup>* , we obtain

$$\oint\_{C\_{\pm}} \frac{f(z)}{z - a} \, \mathrm{d}z = \pm 2\pi \mathrm{i} \begin{cases} f(a), & \text{if the path } C\_{\pm} \text{surrounds } a, \\ 0, & \text{if } a \text{ lies outside the path } C\_{\pm}. \end{cases} \tag{A.90}$$

If a pole on the real axis *<sup>a</sup>* <sup>∈</sup> <sup>R</sup> is slightly shifted by a vanishing imaginary amount to lim→0<sup>+</sup> - *C*± *e*i<sup>ω</sup> *<sup>t</sup>* <sup>ω</sup>−i−*<sup>a</sup>* <sup>d</sup><sup>ω</sup> (regularization) so that it lies within the path *C*<sup>+</sup> and not in *C*−, the result is perfectly causal and vanishes at negative times: *h*(*t*) = 2πi lim→0<sup>+</sup> *e*<sup>i</sup>*a t*− *<sup>t</sup> u*(*t*), with the unit step function *u*(*t*) = 1 for *t* ≥ 0 and 0 for *t* < 0, see Fig.A.1.

*Integral in k*. Causality requires specific regularization in frequency as shown above: Replacement of ω by lim→0<sup>+</sup> ω − i *c* guarantees causality in the partial-fraction expanded Green's function Eq. (A.87). Jordan's lemma requires to use the path *C*<sup>+</sup> to close the improper path in *k* for a positive radius *r* ≥ 0, cf. Fig.A.2,

$$G = \frac{1}{2(2\pi)^2} \lim\_{\epsilon \to 0^+} \frac{1}{\text{ir}} \left[ \int\_{\mathcal{C}^-} \frac{e^{ikr} \, d\mathcal{k}}{\mathcal{k} - \frac{\omega}{c} + \text{i}\epsilon} + \int\_{\mathcal{C}\_+} \frac{e^{ikr} \, \text{d}k}{k + \frac{\omega}{c} - \text{i}\epsilon} \right] = \frac{2\pi \, \text{i}e^{-ikr}}{2\left(2\pi\right)\, \text{i}\, r} = \frac{e^{-\frac{i\omega}{c}r}}{4\pi \, r}. \tag{A.91}$$

#### *A.6.4 Radial Solution of the Helmholtz Equation*

The radial part of the Helmholtz equation in spherical coordinates is characterized by the spherical Bessel differential equation in *x* = *kr*

$$\{\mathbf{y}^{\prime\prime} + 2\mathbf{x}^{-1}\mathbf{y}^{\prime} + \|\mathbf{I} - \mathbf{n}(n+1)\mathbf{x}^{-2}\mathbf{j}\mathbf{y} = \mathbf{0}.\tag{A.92}$$

*Recursive construction*. For *n* = 0, we know that the omnidirectional Green's function is a solution diverging from *<sup>x</sup>* <sup>=</sup> 0, and it is proportional to *<sup>y</sup>* <sup>∝</sup> *<sup>e</sup>*−i*<sup>x</sup> <sup>x</sup>* . We can simplify the equation by inserting *y* = *x*−<sup>1</sup> *un*, which yields with *y* = *x*−<sup>1</sup> *u <sup>n</sup>* − *x*−<sup>2</sup> *un* and *y* = *x*−<sup>1</sup> *u <sup>n</sup>* − 2*x*−<sup>2</sup> *u <sup>n</sup>* + 2*x*−<sup>3</sup>*un* after multiplying with *x*:

$$\begin{aligned} \|u\_n'' - 2x^{-1}u\_n' + 2x^{-2}u\_n\| &+ 2x^{-1}u\_n' - 2x^{-2}u\_n &+ [1 - n(n+1)x^{-2}]u\_n &= 0\\ u\_n'' + [1 - n(n+1)x^{-2}]u\_n &= 0. \end{aligned} \tag{A.93}$$

Moreover, we attempt to find a recursive definition for *n* > 0 using the approach

$$y\_n = \mathbf{x}^{-1} u\_n, \qquad \qquad u\_n = -\mathbf{x}^a \left[ \mathbf{x}^{-a} u\_{n-1} \right]'.$$

We evaluate the recursion for the derivatives

$$\begin{aligned} u\_n &= -u\_{n-1}' + a \, x^{-1} \, u\_{n-1}, \\ u\_n' &= -u\_{n-1}'' + a \, x^{-1} \, u\_{n-1}' - a \, x^{-2} \, u\_{n-1}, \quad \text{with} \, -u\_{n-1}'' = [1 - n(n-1) \, x^{-2}] u\_{n-1} \\ &= a \, x^{-1} \, u\_{n-1}' + \{1 - [n(n-1) + a] \, x^{-2}\} u\_{n-1} \\ u\_n'' &= a \, x^{-1} \, u\_{n-1}'' - a \, x^{-2} \, u\_{n-1}' + \{1 - [n(n-1) + a] \, x^{-2} \} u\_{n-1}' + 2[n(n-1) + a] \, x^{-3} u\_{n-1} \\ &= \{1 - [n(n-1) + 2a] \, x^{-2}\} u\_{n-1}' + \{[2n(n-1) + 2a + an(n-1)] \, x^{-3} - a \, x^{-1}\} u\_{n-1} \\ &= \{1 - [n(n-1) + 2a] \, x^{-2}\} \, u\_{n-1}' + \{[n(n-1)(a+2) + 2a] \, x^{-3} - a \, x^{-1}\} u\_{n-1} \end{aligned}$$

The equation *u <sup>n</sup>* + [1 − *n*(*n* + 1)*x*−2]*un* = 0 using the above expressions becomes

$$\begin{aligned} \left\{ \left[ 1 - \left[ n(n-1) + 2a \right] x^{-2} \right] u\_{n-1}' + \left\{ \left[ n(n-1)(a+2) + 2a \right] x^{-3} - a \right\} x^{-1} \right\} u\_{n-1} \\ + \left[ 1 - n(n+1)x^{-2} \right] \left[ -u\_{n-1}' + a \ge x^{-1} u\_{n-1} \right] = 0. \end{aligned}$$

Comparing coefficients for *u <sup>n</sup>*−<sup>1</sup> and *un*−<sup>1</sup> yields *a* = *n*

$$\begin{array}{ll} \mu\_{n-1}': & 1 - 1 - [n(n-1) + 2a - n(n+1)]\mathbf{x}^{-2} = 2(a-n)\mathbf{x}^{-2} = 0, \\ \mu\_{n-1}: & [-a+a]\mathbf{x}^{-1} + [n(n-1)(a+2) + 2a - an(n+1)]\mathbf{x}^{-3} = 0, \\ & an(n-1) + 2n(n-1) + 2a(1-n) - an(n-1) = 2(n-a) = 0, \end{array}$$

and hereby a recurrence for *yn* from *un* = −*x <sup>n</sup>*[*x*−*nun*−1] with *yn* = *x*−<sup>1</sup> *un*, *un* = *x yn*,

$$\mathbf{y}\_n = -\mathbf{x}^{n-1} [\mathbf{x}^{-(n-1)} \mathbf{y}\_{n-1}]' \qquad \Rightarrow \mathbf{y}\_{n+1} = -\mathbf{x}^n [\mathbf{x}^{-n} \mathbf{y}\_n]'.\tag{A.94}$$

*Singular and regular solution*. We know from the Green's function that the omnidirectional solution should be proportional to *g*<sup>0</sup> ∝ *e*−i*<sup>x</sup>* . The typical radial solution for an omnidirectional source field is chosen to be the spherical Hankel function of the second kind2

$$h\_0^{(2)}(kr) = \frac{e^{-ikr}}{-ikr}, \qquad h\_{n+1}^{(2)}(kr) = -(kr)^n \frac{\mathbf{d}}{\mathbf{d}(kr)} \left[ \frac{1}{(kr)^n} h\_n^{(2)}(kr) \right]. \tag{A.95}$$

However, this solution is not sufficient to solve problems without singularity at*r* = 0. We know that the function sin(*kr*) *kr* is finite at *kr*, and so are all real parts of the spherical Hankel functions of the second kind, the spherical Bessel functions

$$j\_0(kr) = \frac{\sin(kr)}{kr}, \qquad j\_{n+1}(kr) = -(kr)^n \frac{\mathbf{d}}{\mathbf{d}(kr)} \left[ \frac{1}{(kr)^n} j\_n(kr) \right]. \tag{A.96}$$

<sup>2</sup>Note that some scholars use the Fourier expansion *e*iω*<sup>t</sup>* with opposite sign *e*−iω*<sup>t</sup>* and require to use the complex conjugate in every expression containing imaginary constants *<sup>h</sup>*(1) *<sup>n</sup>* <sup>=</sup> *<sup>h</sup>*(2)<sup>∗</sup> *<sup>n</sup>* .

The solutions are linearly independent. One check after some calculation that their Wronski determinant is non-zero [15, Eq. 10.50.1]

$$\begin{vmatrix} j\_n(kr) \ h\_n^{(2)}(kr) \\ j\_n'(kr) \ h\_n''^{(2)}(kr) \end{vmatrix} = j\_n(kr)h\_n'^{(2)}(kr) - j\_n'(kr)h\_n^{(2)}(kr) = -\frac{\mathrm{i}}{(kr)^2}.\tag{A.97}$$

Below, the Frobenius method is shown as alternative way to get these functions.

*Alternative way: Frobenius method*. Given a second-order differential equation with singular coefficients, it can be solved by a generalized infinite power series:

$$\mathbf{y}^{\prime\prime} + \left(\sum\_{l=0}^{\infty} a\_l \,\mathbf{x}^l\right) \mathbf{x}^{-1} \mathbf{y}^{\prime} + \left(\sum\_{l=0}^{\infty} b\_l \,\mathbf{x}^l\right) \mathbf{x}^{-2} \mathbf{y} = \mathbf{0}, \quad \text{solution: } \mathbf{y} = \sum\_{k=0}^{\infty} c\_k \,\mathbf{x}^{k+\gamma}. \tag{A.98}$$

Insertion of the solution yields

$$\sum\_{k=0}^{\infty} (k+\gamma-1) \left(k+\gamma\right) c\_k x^{k+\gamma-2} + \sum\_{k'=0}^{\infty} \sum\_{l=0}^{\infty} [(k'+\gamma)\, a\_l + b\_l] c\_{k'} x^{k'+l+\gamma-2} = 0,$$

an index shift *k* + *l* = *k*, and *l* = 0 ... *k* allows to pull out the common factor *xk*+<sup>γ</sup> <sup>−</sup><sup>2</sup>

$$\begin{aligned} \sum\_{k=0}^{\infty} \left\{ (k+\gamma-1)(k+\gamma)c\_k + \sum\_{l=0}^{k} [(k-l+\gamma)a\_l + b\_l]c\_{k-l} \right\} x^{k+\gamma-2} &= 0\\ \sum\_{k=0}^{\infty} \left\{ [(k+\gamma+a\_0-1)(k+\gamma) + b\_0]c\_k + \sum\_{l=1}^{k} [(k-l+\gamma)a\_l + b\_l]c\_{k-l} \right\} x^{k+\gamma-2} &= 0. \end{aligned}$$

The coefficient of every exponent of *x* in the above equation must be zero:

*indical equation* for *k* = 0 : (γ + *a*<sup>0</sup> − 1)γ + *b*<sup>0</sup> *c*<sup>0</sup> = 0, (A.99)

$$\text{inicial equation for } k = 1: \quad \left[ (\gamma + a\_0)(\gamma + 1) + b\_0 \right] c\_1 + \left[ a\_1 \gamma + b\_1 \right] c\_0 = 0,\tag{A.100}$$

$$\begin{aligned} \text{recurrence for } k > 1: \quad & -\frac{\sum\_{l=1}^{k} [(k-l+\chi)\,a\_l + b\_l] c\_{k-l}}{(k+\chi+a\_0-1)\,(k+\chi) + b\_0} = c\_k. \end{aligned} \tag{A.101}$$

Depending on the specific values found for γ , the recurrence, etc. the Frobenius method suggests how to find or construct an independent pair of solutions.

*Spherical Bessel differential equation*. In *y* + 2*x*−<sup>1</sup> *y* + [−*n*(*n* + 1) + *x* <sup>2</sup>]*x*−<sup>2</sup> *y* = 0, all *al* and *bl* are zero except *a*<sup>0</sup> = 2, *b*<sup>0</sup> = −*n*(*n* + 1), and *b*<sup>2</sup> = 1. Indical equations and recurrence become

$$
\left[\chi(\wp+1) - n(n+1)\right]c\_0 = 0,\tag{A.102}
$$

$$\left[ (\gamma + 1)(\gamma + 2) - n(n + 1) \right] c\_1 = 0,\tag{A.103}$$

$$(k+\nu+1)(k+\nu)\,c\_k = -c\_{k-2}.\tag{A.104}$$

We see that the recurrence is again a two-step recurrence, so that one can choose between an even solution using *c*<sup>0</sup> = 0, *c*<sup>1</sup> = 0 yielding γ = *n* or γ = −(*n* + 1), [or an odd solution that won't be used, with *c*<sup>0</sup> = 0, *c*<sup>1</sup> = 0 yielding γ + 1 = *n* or γ + 1 = −(*n* + 1)].

*Spherical Bessel functions*. The choice γ = *n* yields a solution converging everywhere: Powers of *<sup>x</sup>* are all positive, the recurrences *ck* = − *ck*−<sup>2</sup> (*n*+*k*+1)(*n*+*k*) <sup>=</sup> (−1)*<sup>k</sup> <sup>c</sup>*<sup>0</sup> (*n*+*k*+1)! yield a convergence radius *R* = lim*<sup>k</sup>*→∞ & & *ck*−2 *ck* & & <sup>=</sup> lim*<sup>k</sup>*→∞(*<sup>n</sup>* <sup>+</sup> *<sup>k</sup>* <sup>+</sup> <sup>1</sup>)(*<sup>n</sup>* <sup>+</sup> *<sup>k</sup>*) = ∞. With a starting value *<sup>c</sup>*<sup>0</sup> <sup>=</sup> <sup>2</sup>*<sup>n</sup> <sup>n</sup>*! (2*n*+1)! , solutions are called spherical Bessel functions [5, Chap. 3.4]

$$j\_n = (2\boldsymbol{\alpha})^n \sum\_{k=0}^{\infty} \frac{(-1)^k (n+k)!}{k! [2(n+k)+1]!} \mathbf{x}^{2k}.\tag{A.105}$$

which are a physical set of regular solutions with *n*-fold zero at 0. The spherical Bessel function for *n* = 0 is

$$j\_0(\mathbf{x}) = \frac{\sin \mathbf{x}}{\mathbf{x}}.\tag{A.106}$$

With the above recursive definition iterated [5, Eq. 3.4.15] one could define

$$j\_{n+1} = -\mathbf{x}^n \frac{\mathbf{d}}{\mathbf{dx}} \left(\frac{1}{\mathbf{x}^n} j\_n\right), \qquad \qquad j\_n = (-\mathbf{x})^n \left(\frac{1}{\mathbf{x}} \frac{\mathbf{d}}{\mathbf{dx}}\right)^n j\_0. \tag{A.107}$$

*Spherical Neumann functions*. For γ = −(*n* + 1) and *c*<sup>0</sup> = 0, *c*<sup>1</sup> = 0, the recurrences are *ck* = − *ck*−<sup>2</sup> (*n*−*k*+1)(*n*−*k*) <sup>=</sup> (−1)*<sup>k</sup> <sup>c</sup>*<sup>0</sup> (*n*−*k*+1)! and yield the spherical Neumann functions with an (*n* + 1)-fold pole at 0. They also obey the recursive definition from above,

$$y\_0 = -\frac{\cos x}{x}, \qquad \qquad \qquad y\_n = (-x)^n \left(\frac{\text{l} \cdot \text{d}}{x \text{ d}x}\right)^n y\_0. \tag{A.108}$$

*Spherical Hankel functions*. The spherical Neumann and Bessel functions based on either sin or cos are clearly linearly independent. The spherical Bessel functions are useful to representing fields convergent everywhere. Physical source fields (Green's function) diverge at the source location and exhibit a specific way phase radiates, with *<sup>G</sup>* <sup>∝</sup> *<sup>e</sup>*−i*<sup>x</sup> <sup>x</sup>* . The spherical Bessel and Neumann functions are asymptotically similar to [15, Eq. 10.52.3]

$$\lim\_{x \to \infty} j\_n = (-1)^n \frac{\sin(x)}{x}, \qquad \lim\_{x \to \infty} y\_n = (-1)^{n+1} \frac{\cos(x)}{x}, \tag{A.109}$$

therefore only their combination to spherical Hankel functions of the second kind

$$h\_n^{(2)} = j\_n - \text{i } \mathbf{y}\_n \tag{A.110}$$

yields useful physical set of singular solutions. They inherit their (*n* + 1)-fold pole from the spherical Neumann functions at 0. Their limiting form for large arguments are

$$\begin{split} \lim\_{r \to \infty} h\_n^{(2)}(\mathbf{x}) &= -\mathbf{x}^{n-1} \lim\_{r \to \infty} \frac{\mathbf{d}}{\mathbf{dx}} \left( \frac{1}{\mathbf{x}^{n-1}} h\_{n-1}^{(2)} \right) \\ &= -\mathbf{x}^{n-1} \lim\_{r \to \infty} \left( -\frac{n-1}{\mathbf{x}^n} h\_{n-1}^{(2)} + \frac{1}{\mathbf{x}^{n-1}} \frac{\mathbf{d}}{\mathbf{dx}} h\_{n-1}^{(2)} \right) = -\lim\_{r \to \infty} \frac{\mathbf{d}}{\mathbf{dx}} h\_{n-1}^{(2)} \\ &= (-1)^n \lim\_{r \to \infty} \frac{\mathbf{d}^n}{\mathbf{dx}^n} h\_0^{(2)} = \mathbf{i}^n h\_0^{(2)}(\mathbf{x}). \end{split} \tag{A.111}$$

With Eqs. (A.110), (A.106) and (A.108), the zeroth-order spherical Hankel function is *h*(2) <sup>0</sup> (*x*) <sup>=</sup> *<sup>e</sup>*−i*<sup>x</sup>* <sup>−</sup>i*<sup>x</sup>* .

*Alternative implementation by cylindrical functions*. We can transform the spherical Bessel differential equation by inserting *y* = *x*<sup>α</sup> *u* and obtain after division by *x*<sup>α</sup>

$$\begin{aligned} \left[\mathbf{x}^{\alpha}u'' + 2\alpha \mathbf{x}^{\alpha-1}u' + \alpha(\alpha-1)u \right. &+ 2\mathbf{x}^{\alpha-1}u' + 2\alpha \mathbf{x}^{\alpha-2}u \ &+ [1 - n(n+1)\mathbf{x}^{-2}]u = 0\\ u'' + 2\frac{\alpha+1}{x}u' + \left[1 + \frac{\alpha(\alpha+1) - n(n+1)}{x^2}\right]u = 0. \end{aligned}$$

For <sup>α</sup> = −<sup>1</sup> <sup>2</sup> , the equation for *u* becomes the Bessel differential equation with α(α + <sup>1</sup>) <sup>−</sup> *<sup>n</sup>*(*<sup>n</sup>* <sup>+</sup> <sup>1</sup>) = −(*n*<sup>2</sup> <sup>+</sup> *<sup>n</sup>* <sup>+</sup> <sup>1</sup> <sup>4</sup> ) = −(*<sup>n</sup>* <sup>+</sup> <sup>1</sup> 2 )2

$$
\mu'' + \frac{1}{\varkappa} \mu' + \left[ 1 - \frac{(n + \frac{1}{2})^2}{\varkappa^2} \right] \mu = 0. \tag{A.112}
$$

Consequently, the spherical Bessel functions and spherical Hankel functions of the second kind can be implemented using the Bessel and Hankel functions that can be found in any standard maths programming library. The specific relations are:

$$j\_n(\mathbf{x}) = \sqrt{\frac{\pi}{2}} \frac{1}{\mathbf{x}} \, J\_{n + \frac{1}{2}}(\mathbf{x}), \qquad \qquad h\_n^{(2)}(\mathbf{x}) = \sqrt{\frac{\pi}{2}} \frac{1}{\mathbf{x}} \, H\_{n + \frac{1}{2}}^{(2)}(\mathbf{x}). \tag{A.113}$$

#### *A.6.5 Green's Function in Spherical Solutions, Angular Distributions, Plane Waves*

We can write the inhomogeneous Helmholtz equation (- + *k*2)*G* = −δ to be excited by a source at the direction *θ* <sup>0</sup> at the radius *r*0. We decompose the excitation into a Delta function in radius and direction <sup>−</sup>*r*−<sup>2</sup> <sup>0</sup> δ(*<sup>r</sup>* <sup>−</sup> *<sup>r</sup>*0)δ(*θ*<sup>T</sup> <sup>0</sup> *θ* − 1). The directional part needs not be restricted to the spherical Dirac delta function, so we can take a distribution of sources at *r*0, weighted by the panning function *g*(*θ*),

$$\left(\triangle + k^{2}\right)p = -r\_{0}^{-2}\delta(r - r\_{0})\,\mathrm{g}\,(\theta). \tag{A.114}$$

From the spherical basis solutions, we know that at a radius other than *r*0, *p* can be expanded into spherical harmonics

$$p = \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} \psi\_{nm} \, Y\_n^m(\theta). \tag{A.115}$$

Acting on the decomposition of *p*, the directional part of the Laplacian will yield the eigenvalue <sup>ϕ</sup>,ζ*Y <sup>m</sup> <sup>n</sup>* = −*n*(*n* + 1)*r*−<sup>2</sup> *Y <sup>m</sup> <sup>n</sup>* of the spherical harmonics, and its radial part *r* <sup>2</sup><sup>r</sup> <sup>=</sup> <sup>∂</sup><sup>2</sup> <sup>∂</sup>*<sup>r</sup>* <sup>2</sup> <sup>+</sup> <sup>2</sup>*r*−<sup>1</sup> <sup>∂</sup> <sup>∂</sup>*<sup>r</sup>* , as around Eq. (6.11), hence

$$\begin{aligned} (\triangle + k^2) \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} \psi\_{nm} Y\_n^m(\theta) &= 0\\ \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} \left[ \frac{\partial^2}{\partial r^2} + \frac{2\partial}{r\partial r} + k^2 - \frac{n(n+1)}{r^2} \right] \psi\_{nm} Y\_n^m(\theta) &= -r\_0^{-2} \delta(r - r\_0) \, g(\theta). \end{aligned}$$

Obviously, ψ*nm* must depend on *k* and *r*, so we may pull the factor *k* into the differentials <sup>d</sup> <sup>d</sup>*<sup>r</sup>* <sup>=</sup> *<sup>k</sup>* <sup>d</sup> <sup>d</sup>*kr* to get the differential operator *k*<sup>2</sup> <sup>d</sup><sup>2</sup> <sup>d</sup>(*kr*)<sup>2</sup> <sup>+</sup> <sup>2</sup> *kr* d <sup>d</sup>(*kr*) <sup>+</sup> <sup>1</sup> <sup>−</sup> *<sup>n</sup>*(*n*+1) (*kr*)<sup>2</sup> and observe *kr* as its variable on the left, and we replace *kr* by *x* for brevity. Applying the factor *k*−<sup>2</sup> and the spherical harmonics transform % <sup>S</sup><sup>2</sup> *Y <sup>m</sup> <sup>n</sup>* (*θ*) d*θ* on the equation removes the double sum on the left (orthogonality) and decomposes the panning function *g*(*θ*) on the right into γ*nm*

$$
\left[\frac{\mathbf{d}^2}{\mathbf{d}x^2} + \frac{2}{x}\frac{\mathbf{d}}{\mathbf{d}x} + 1 - \frac{n(n+1)}{x}\right]\psi\_{nm} = -\left(kr\_0\right)^{-2}\delta(r-r\_0)\,\chi\_{nm}.
$$

We collect the *x*-independent term γ*nm* as factors of the solution ψ*nm* = *y* γ*nm* and get

$$\mathbf{y}^{\prime\prime} + \frac{2}{\mathbf{x}}\mathbf{y}^{\prime} + \left[1 - \frac{n(n+1)}{\mathbf{x}^2}\right]\mathbf{y} = -\mathbf{x}\_0^{-2}\delta(r - r\_0),\tag{A.116}$$

the inhomogeneous spherical Bessel differential equation. As described, e.g., in [16, 17], the inhomogeneous differential equation can be solved by the Lagrangian *variation of the parameters* for equations of the type *y* + *py* + *qy* = *r*, knowing its independent homogeneous solutions *y*<sup>1</sup> = *h*(2) *<sup>n</sup>* (*x*) and *y*<sup>2</sup> = *jn*(*x*).

It uses a solution *y* = *uy*<sup>1</sup> + *vy*<sup>2</sup> with variable parameters *u* and *v*, which upon first and second-order differentiation becomes

$$\begin{aligned} \mathbf{y} &= \mathbf{u}\mathbf{y}\_1 + \mathbf{v}\mathbf{y}\_2, \qquad \mathbf{y}' = \mathbf{u}\mathbf{y}'\_1 + \mathbf{u}'\mathbf{y}\_1 &+ \mathbf{v}\mathbf{y}'\_2 + \mathbf{v}'\mathbf{y}\_2\\ \mathbf{y}'' &= \mathbf{u}\mathbf{y}''\_1 + 2\mathbf{u}'\mathbf{y}'\_1 + \mathbf{u}''\mathbf{y}\_1 &+ \mathbf{v}\mathbf{y}''\_2 + 2\mathbf{v}'\mathbf{y}'\_2 + \mathbf{v}''\mathbf{y}\_2. \end{aligned}$$

Inserted into the equation *y* + *py* + *qy* = *r*, this yields

$$\begin{split} \overbrace{\nu\left(\mathbf{y}\_{1}''+p\mathbf{y}\_{1}'+q\mathbf{y}\_{1}\right)}^{\rightarrow 0} + \nu\overbrace{\left(\mathbf{y}\_{2}''+p\mathbf{y}\_{2}'+q\mathbf{y}\_{2}\right)}^{\rightarrow 0} + \boldsymbol{u}''\mathbf{y}\_{1} + 2\boldsymbol{u}'\mathbf{y}\_{1}' + \boldsymbol{v}''\mathbf{y}\_{2} + 2\boldsymbol{v}'\mathbf{y}\_{2}' \\ + p(\mathbf{u}'\mathbf{y}\_{1} + \mathbf{v}'\mathbf{y}\_{2}) &= \\ (\mathbf{u}'\mathbf{y}\_{1} + \mathbf{v}'\mathbf{y}\_{2})' + \mathbf{u}'\mathbf{y}\_{1}' + \mathbf{v}'\mathbf{y}\_{2}' + p\left(\mathbf{u}'\mathbf{y}\_{1} + \mathbf{v}'\mathbf{y}\_{2}\right) &= \mathbf{u}'\mathbf{y}\_{1}' + \mathbf{v}'\mathbf{y}\_{2}' + (\frac{\mathbf{d}}{\mathbf{dx}} + p)(\mathbf{u}'\mathbf{y}\_{1} + \mathbf{v}'\mathbf{y}\_{2}) = \boldsymbol{r}. \end{split}$$

Now two functions *u* and *v* are to be determined from only one equation, so we may pose an additional constraint. The above equation would simplify if the term (*u y*<sup>1</sup> + *v y*2) vanished. By this and the simplified equation, we get two conditions

$$\begin{array}{cc} \mathbf{I}: & \mathbf{u'y\_1} + \mathbf{v'y\_2} = \mathbf{0} \\ \mathbf{II}: & \mathbf{u'y'\_1} + \mathbf{v'y'\_2} = \mathbf{r} \end{array}$$

and obtain by elimination with either A = I *y* <sup>1</sup> − II *y*<sup>1</sup> or B = I *y* <sup>2</sup> − II *y*<sup>2</sup>

$$\begin{array}{ll} \mathbf{A}: & \nu' \underbrace{(\mathbf{y}\_1' \mathbf{y}\_2 - \mathbf{y}\_1 \mathbf{y}\_2')}\_{-W} = -r \mathbf{y}\_1 \\ \mathbf{B}: & \mu' \underbrace{(\mathbf{y}\_1' \mathbf{y}\_2 - \mathbf{y}\_1 \mathbf{y}\_2')}\_{-W} = -r \mathbf{y}\_2. \end{array}$$

So that the solution *<sup>y</sup>* <sup>=</sup> *uy*<sup>1</sup> <sup>+</sup> *vy*<sup>2</sup> uses *<sup>u</sup>* <sup>=</sup> % *r y*<sup>2</sup> *<sup>W</sup>* <sup>d</sup>*<sup>x</sup>* and *<sup>v</sup>* <sup>=</sup> % *r y*<sup>1</sup> *<sup>W</sup>* d*x*. In our case, we have *y*<sup>1</sup> = *h*(2) *<sup>n</sup>* (*x*), *<sup>y</sup>*<sup>2</sup> <sup>=</sup> *jn*(*x*), *<sup>r</sup>* = −*x*−<sup>2</sup> <sup>0</sup> δ(*r* − *r*0), and the Wronskian *W* = (i*x* <sup>2</sup>)−<sup>1</sup> from Eq. (A.97), hence with integration constants enforcing the physical solutions:

$$\mathbf{y} = -h\_n^{(2)}(\mathbf{x}) \int\_0^\chi \mathbf{i} \mathbf{x}^2 \, j\_n(\mathbf{x}) \, \mathbf{x}\_0^{-2} \delta(r - r\_0) \, \mathbf{d}\mathbf{x} - j\_n(\mathbf{x}) \int\_\chi^\infty \mathbf{i} \mathbf{x}^2 \, h\_n^{(2)}(\mathbf{x}) \, \mathbf{x}\_0^{-2} \delta(r - r\_0) \, \mathbf{d}\mathbf{x} \, \mathbf{x}$$

To convert δ(*r* − *r*0) into δ(*x* − *x*0) with *x* = *kr*, we use % δ(*x*) d*x* = % δ(*r*) d*r* = 1 with the integration constant replaced, <sup>d</sup>*<sup>x</sup>* <sup>d</sup>*<sup>r</sup>* = *k*, hence d*x* = *k* d*r* and obviously by % δ(*x*) *k* d*r* = % δ(*r*) d*r* we find δ(*r*) = *k* δ(*x*),

$$\begin{split} \mathbf{y} &= -h\_{n}^{(2)}(\mathbf{x}) \int\_{0}^{\mathbf{x}} \mathbf{i} \mathbf{x}^{2} j\_{\boldsymbol{n}}(\mathbf{x}) \, k \, \mathbf{x}\_{0}^{-2} \delta(\mathbf{x} - \mathbf{x}\_{0}) \, \mathbf{d}\mathbf{x} - j\_{\boldsymbol{n}}(\mathbf{x}) \int\_{\mathbf{x}}^{\infty} \mathbf{i} \mathbf{x}^{2} h\_{n}^{(2)}(\mathbf{x}) \, k \, \mathbf{x}\_{0}^{-2} \delta(\mathbf{x} - \mathbf{x}\_{0}) \, \mathbf{d}\mathbf{x} \\ &= -\mathbf{i} \, k \begin{cases} h\_{n}^{(2)}(\mathbf{x}) \, j\_{\boldsymbol{n}}(\mathbf{x}\_{0}), & \text{for } \mathbf{x} \ge \mathbf{x}\_{0}, \\ j\_{\boldsymbol{n}}(\mathbf{x}) \, h\_{n}^{(2)}(\mathbf{x}\_{0}) & \text{for } \mathbf{x} \le \mathbf{x}\_{0}. \end{cases} \end{split}$$

The solution becomes after re-substituting *x* = *kr* and expanding ψ*nm* = *y* γ*nm* over the spherical harmonics *p* = <sup>∞</sup> *n*=0 *<sup>n</sup> <sup>m</sup>*=−*<sup>n</sup>* <sup>ψ</sup>*nmY <sup>m</sup> <sup>n</sup>* (*θ*):

$$p = -\mathrm{i}\,k \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} \gamma\_{nm} Y\_n^m(\theta) \begin{cases} h\_n^{(2)}(kr) \, j\_n(kr\_0), & \text{for } r \ge r\_0, \\ j\_n(kr) \, \, h\_n^{(2)}(kr\_0) \quad \text{for } r \le r\_0. \end{cases} \tag{A.117}$$

*Green's function*. For the Green's function at the direction *θ* 0, the angular panning function is expanded as φ*nm* = *Y <sup>m</sup> <sup>n</sup>* (*θ* <sup>0</sup>), and we get the formulation of the Green's function in terms of spherical basis functions:

$$G = -\mathrm{i}\,k \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} Y\_n^m(\theta\_0) \, Y\_n^m(\theta) \begin{cases} h\_n^{(2)}(kr) \, j\_n(kr\_0) & \text{for } r \ge r\_0, \\ j\_n(kr) \, \, h\_n^{(2)}(kr\_0) & \text{for } r \le r\_0. \end{cases} \tag{A.118}$$

*Plane waves/far field approximation*. Equation (6.7) in Sect. 6.3.1 formulates plane waves *<sup>p</sup>* <sup>=</sup> *<sup>e</sup>*i*<sup>k</sup> <sup>θ</sup>*<sup>T</sup> <sup>0</sup> *<sup>r</sup>* as far-field limit *<sup>p</sup>* <sup>=</sup> <sup>4</sup><sup>π</sup> lim*<sup>r</sup>*0→∞ *<sup>r</sup>*<sup>0</sup> *<sup>e</sup>*−i*kr*<sup>0</sup> *<sup>G</sup>* <sup>=</sup> lim*<sup>r</sup>*0→∞ <sup>1</sup> −i*k h*(2) <sup>0</sup> (*kr*0) *G*. Using Eq. (A.117), a distribution of plane waves driven by the gains *g*(*θ*) = *n <sup>m</sup>* γ*nmY <sup>m</sup> <sup>n</sup>* consequently yields with lim*<sup>r</sup>*0→∞ *h*(2) *<sup>n</sup>* (*kr*0) = i *<sup>n</sup> h*(2) <sup>0</sup> (*kr*0),

$$p = 4\pi \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} j\_n(kr) \left[ \lim\_{r \to \infty} \frac{h\_n^{(2)}(kr\_0)}{h\_0^{(2)}(kr\_0)} \right] Y\_n^m(\theta) \,\chi\_{nm}$$

$$= 4\pi \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} \text{i}^n \, j\_n(kr) \, Y\_n^m(\theta) \,\chi\_{nm}.\tag{A.119}$$

or for a single plane-wave direction γ*nm* = *Y <sup>m</sup> <sup>n</sup>* (*θ* <sup>0</sup>)

$$p = 4\pi \sum\_{n=0}^{\infty} \sum\_{m=-n}^{n} \text{i}^{n} \, j\_{n}(kr) \, Y\_{n}^{m}(\theta) \, Y\_{n}^{m}(\theta\_{0}). \tag{A.120}$$

#### **A.7 Sine and Tangent Law**

The sine and tangent law [18] observes the sound pressure of plane waves at to locations *x* = 0, *y* = ±*d* at ear distance in order to simulate the ear signals. A plane wave from the left half of the room from the angle ϕ > 0 first arrives at the left ear

*p*left = *e*<sup>i</sup> *kd* sin <sup>ϕ</sup> and later on the right one *p*right = *e*−<sup>i</sup> *kd* sin <sup>ϕ</sup>. The phase difference is ϕ = 2 *kd* sin ϕ.

A superimposed pair of plane waves from the directions ±α arrives at the left ear as *p*left = *g*<sup>1</sup> *e*<sup>i</sup> *kd* sin <sup>α</sup> + *g*<sup>2</sup> *e*−<sup>i</sup> *kd* sin <sup>α</sup>, right as *p*right = *g*<sup>1</sup> *e*−<sup>i</sup> *kd* sin <sup>α</sup> + *g*<sup>2</sup> *e*<sup>i</sup> *kd* sin <sup>α</sup> = *p*∗ left. The phase difference ±<sup>α</sup> <sup>=</sup> <sup>2</sup>∠*p*left <sup>=</sup> 2 arctan (*g*1−*g*2) sin(*kd* sin α) (*g*1+*g*2) cos(*kd* sin α) can be linearized for long wave lengths *kd* <sup>→</sup> 0 to ±<sup>α</sup> <sup>≈</sup> 2 arctan <sup>+</sup> *kd <sup>g</sup>*1−*g*<sup>2</sup> *<sup>g</sup>*1+*g*<sup>2</sup> sin <sup>α</sup> , <sup>≈</sup> <sup>2</sup> *<sup>g</sup>*1−*g*<sup>2</sup> *g*1+*g*<sup>2</sup> *kd* sin α.

Comparing the phase difference of the single plane wave with the one of the superimposed pair, 2 *kd* sin <sup>ϕ</sup> <sup>=</sup> <sup>2</sup> *kd <sup>g</sup>*1−*g*<sup>2</sup> *<sup>g</sup>*1+*g*<sup>2</sup> sin <sup>α</sup>, one arrives at the sine law

$$
\sin \varphi = \frac{\mathbf{g}\_1 - \mathbf{g}\_2}{\mathbf{g}\_1 + \mathbf{g}\_2} \sin \alpha.
$$

If we claim our hearing to possess the ability to not only estimate the interaural phase difference but also its derivative with regard to head rotation ∂ ∂δ , we arrive at a value pair of binaural features (ϕ, ∂ϕ ∂δ ) = 2 *kd* (sin ϕ, cos ϕ) than should match the one of the stereophonic plane-wave pair. For stereo, the phase difference derived with regard to head rotation is 2 *kd* <sup>∂</sup> ∂δ±<sup>α</sup> <sup>≈</sup> <sup>2</sup> *kd <sup>g</sup>*<sup>1</sup> <sup>∂</sup> ∂δ sin(α+δ)|δ=0+*g*<sup>2</sup> <sup>∂</sup> ∂δ sin(−α+δ)|δ=<sup>0</sup> *<sup>g</sup>*1+*g*<sup>2</sup> = 2 *kd <sup>g</sup>*1+*g*<sup>2</sup> *<sup>g</sup>*1+*g*<sup>2</sup> cos <sup>α</sup> <sup>=</sup> <sup>2</sup> *kd* cos <sup>α</sup>, and yields (±α, ∂±<sup>α</sup> ∂δ ) <sup>=</sup> <sup>2</sup> *kd* ( *<sup>g</sup>*1−*g*<sup>2</sup> *<sup>g</sup>*1+*g*<sup>2</sup> sin α, cos α). In polar coordinates, the radius of both value pairs differs. While the plane wave yields a value pair at the radius 2 *kd* in the binaural feature space, the stereophonic waves is of the radius 2 *kd* only at ±α, at which one of the two gains must vanish, and amplitude panning can be used to connect these two points 2*kd* (± sin α, cos α) by a straight line. The plane wave with the most similar feature pair must lie on the same polar angle. We may equate the tangents of both points ϕ ∂ϕ ∂δ <sup>=</sup> ±<sup>α</sup> ∂±<sup>α</sup> ∂δ and obtain the tangent law:

$$
\tan \varphi = \frac{\mathbf{g}\_1 - \mathbf{g}\_2}{\mathbf{g}\_1 + \mathbf{g}\_2} \tan \alpha.
$$

If instead of the angle of a plane wave with the closest features to those of a given amplitude difference is searched, but the closest features of an amplitude difference matching those of a given plane wave, then the sine law is the best match, even in the two-dimensional feature space.

#### **References**

